Vegan-specific signature implies healthier metabolic profile: findings from diet-related multi-omics observational study based on different European populations
Statistical report for pathway analysis
Authors and affiliations
Anna Ouradova1,*, Giulio Ferrero2,3,*, Miriam Bratova4, Nikola Daskova4, Alena Bohdanecka4,5, Klara Dohnalova6, Marie Heczkova4, Karel Chalupsky6, Maria Kralova7,8, Marek Kuzma9, István Modos4, Filip Tichanek4, Lucie Najmanova9, Barbara Pardini10, Helena Pelantová9, Sonia Tarallo10, Petra Videnska11, Jan Gojda1,#, Alessio Naccarati10,#, Monika Cahova4,#
* These authors have contributed equally to this work and share first authorship
# These authors have contributed equally to this work and share last authorship
1 Department of Internal Medicine, Kralovske Vinohrady University Hospital and Third Faculty of Medicine, Charles University, Prague, Czech Republic
2 Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
3 Department of Computer Science, University of Turin, Turin, Italy
4 Institute for Clinical and Experimental Medicine, Prague, Czech Republic
5 First Faculty of Medicine, Charles University, Prague, Czech Republic
6 Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, Czech Republic
7 Ambis University, Department of Economics and Management, Prague, Czech Republic
8 Department of Informatics, Brno University of Technology, Brno, Czech Republic
9 Institute of Microbiology of the Czech Academy of Sciences, Prague, Czech Republic
10 Italian Institute for Genomic Medicine (IIGM), c/o IRCCS Candiolo, Turin, Italy
11 Mendel University, Department of Chemistry and Biochemistry, Brno, Czech Republic
This is a statistical report of the study A vegan diet signature from a multi-omics study on different European populations is related to favorable metabolic outcomes that is currenlty under review
When using this code or data, cite the original publication:
TO BE ADDED
BibTex citation for the original publication:
TO BE ADDED
Original GitHub repository: https://github.com/filip-tichanek/ItCzVegans
Statistical reports can be found on the reports hub.
Data analysis is described in detail in the statistical methods report.
1 Introduction
This project explores potential signatures of a vegan diet across the microbiome, metabolome, and lipidome. We used data from healthy vegan and omnivorous human subjects in two countries (Czech Republic and Italy), with subjects grouped by Country and Diet, resulting in four distinct groups.
To assess the generalizability of these findings, we validated our results with an independent cohort from the Czech Republic for external validation.
1.1 Statistical Methods
The statistical modeling approach is described in detail in this report. Briefly, the methods used included:
Multivariate analysis: We conducted multivariate analyses (PERMANOVA, PCA, correlation analyses) to explore the effects of
diet,country, and their possible interaction (diet : country) on the microbiome, lipidome, and metabolome compositions in an integrative manner. This part of the analysis is not available on the GitHub page, but the code will be provided upon request.Linear models: Linear models were applied to estimate the effects of
diet,country, and their interaction (diet:country) on individual lipids, metabolites, bacterial taxa and pathways (“features”). Features that significantly differed between diet groups (based on the estimated average conditional effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.05) were further examined in the independent external validation cohort to assess whether these associations were reproducible. Next, we fit linear models restricted to vegan participants to test whether omics profiles varied with the duration of vegan diet. Fixed-effect predictors were diet duration (per 10 years), country, their interaction, and age (included due to correlation with diet duration).Predictive models (elastic net): We employed elastic net logistic regression (via the
glmnetR package) to predict vegan status based on metabolome, lipidome, microbiome and pathways data (one model per dataset; four models in total). We considered three combinations of Lasso and Ridge penalties (alpha = 0, 0.2, 0.4). For each alpha, we selected the penalty strength (λ1se) using 10-fold cross-validation. This value corresponds to the most regularized model whose performance was within one standard error of the minimum deviance. The alpha–lambda pair with the lowest deviance was chosen to fit the final model, whose coefficients are reported.
To estimate model performance, we repeated the full modeling procedure (including hyperparameter tuning) 500 times on bootstrap resamples of the training data. In each iteration, the model was trained on the resampled data and evaluated on the out-of-bag subjects (i.e., those not included in the training set in that iteration). The mean, and 2.5th, and 97.5th percentiles of the resulting ROC-AUC values represent the estimated out-of-sample AUC and its 95% confidence interval.
Finally, the final model was applied to an independent validation cohort to generate predicted probabilities of vegan status. These probabilities were then used to assess external discrimination between diet groups (ROC-AUC in the independent validation cohort). The elastic net models were not intended for practical prediction, but to quantify the strength of the signal separating the dietary groups, with its uncertainty, by using all features of a given dataset jointly. It also offered a complementary perspective on which features are most clearly associated with diet
2 Initiation
2.1 Set home directory
Open code
setwd('/home/ticf/secured_data/GitRepo/ticf/478_MOCA_italian')2.2 Upload initiation file
Open code
source('478_initiation.R')3 Data
3.1 Upload and epxlore
3.1.1 Get pathways
Only these pathways that have non-zero values in all observation in at least one dataset will be chosen
Open code
data_path_originalCZ <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_CZ_humann.tsv",
header = TRUE, sep = "\t"
) %>%
filter(!grepl("\\|", X..Pathway)) %>%
select(X..Pathway)
dim(data_path_originalCZ)
## [1] 580 1
data_path_originalIT <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_IT_humann.tsv",
header = TRUE, sep = "\t"
) %>%
filter(!grepl("\\|", X..Pathway)) %>%
select(X..Pathway)
dim(data_path_originalIT)
## [1] 488 1
data_path_validation <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv",
header = TRUE, sep = "\t"
) %>%
filter(!grepl("\\|", X..Pathway)) %>%
select(X..Pathway)
dim(data_path_validation)
## [1] 580 1
tr <- intersect(
data_path_originalCZ$X..Pathway,
data_path_originalIT$X..Pathway
)
paths <- intersect(tr, data_path_validation$X..Pathway)
length(paths)
## [1] 4813.1.2 Italian data
Open code
data_path_originalIT <- read.delim(
'gitignore/data/pathw/Pathway_abundance_MetaCyc_IT_humann.tsv',
header = TRUE, sep = "\t"
) %>%
filter(X..Pathway %in% paths)
features <- data_path_originalIT[,1]
split_features <- strsplit(features, ": ")
feature_name <- sapply(split_features, `[`, 1)
feature_description <- sapply(split_features, `[`, 2)
feature_name <- gsub(" |-", "_", feature_name)
row.names(data_path_originalIT) <- c(feature_name)
data_path_originalIT <- data.frame(t(data_path_originalIT[,-1]))
attr(data_path_originalIT, "description") <- feature_description
originalIT_features <- data_path_originalIT%>%
select(where(~ mean(. == 0) < 0.7)) %>%
select(-UNMAPPED, -UNINTEGRATED) %>%
colnames()3.1.3 Czech data
Open code
data_path_originalCZ <- read.delim(
'gitignore/data/pathw/Pathway_abundance_MetaCyc_CZ_humann.tsv',
header = TRUE, sep = "\t"
) %>%
filter(X..Pathway %in% paths)
features <- data_path_originalCZ[,1]
split_features <- strsplit(features, ": ")
feature_name <- sapply(split_features, `[`, 1)
feature_description <- sapply(split_features, `[`, 2)
feature_name <- gsub(" |-", "_", feature_name)
row.names(data_path_originalCZ) <- c(feature_name)
data_path_originalCZ <- data.frame(t(data_path_originalCZ[,-1]))
attr(data_path_originalCZ, "description") <- feature_description
originalCZ_features <- data_path_originalCZ %>%
select(where(~ mean(. == 0) < 0.7)) %>%
select(-UNMAPPED, -UNINTEGRATED) %>%
colnames()3.1.4 Validation data
Open code
data_pathways_validation <- read.delim(
'gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv',
header = TRUE,
sep = "\t"
) %>%
filter(X..Pathway %in% paths)
features <- data_pathways_validation[,1]
split_features <- strsplit(features, ": ")
feature_name <- sapply(split_features, `[`, 1)
feature_description <- sapply(split_features, `[`, 2)
feature_name <- gsub(" |-", "_", feature_name)
row.names(data_pathways_validation) <- c(feature_name)
data_pathways_validation <- data.frame(t(data_pathways_validation[,-1]))
attr(data_pathways_validation, "description") <- feature_description
row.names(data_pathways_validation) <- gsub("^X", "K", row.names(data_pathways_validation))
validation_features <- data_pathways_validation %>%
select(-UNMAPPED, -UNINTEGRATED) %>%
colnames()3.1.5 Merging data
Modify data
Open code
# which taxa
set.seed(478)
features <- intersect(originalIT_features, originalCZ_features)
features <- intersect(features, validation_features)
# CZ
data_path_originalCZ_filtered <- data_path_originalCZ %>%
select(any_of(features))
data_path_originalCZ_filtered <- data_path_originalCZ_filtered/ rowSums(data_path_originalCZ_filtered)
data_path_originalCZ_filtered <- data_path_originalCZ_filtered %>%
mutate(
ID = row.names(.),
Country = 'CZ'
) %>%
select(ID, Country, any_of(features))
# IT
data_path_originalIT_filtered <- data_path_originalIT %>%
select(any_of(features))
data_path_originalIT_filtered <- data_path_originalIT_filtered/ rowSums(data_path_originalIT_filtered)
data_path_originalIT_filtered <- data_path_originalIT_filtered %>%
mutate(
ID = row.names(.),
Country = 'IT'
) %>%
select(ID, Country, any_of(features))
# joining the table
data_path_original_filtered <- bind_rows(data_path_originalIT_filtered,
data_path_originalCZ_filtered)
bacteria_data <- data_path_original_filtered %>%
select(all_of(features))
if (file.exists("gitignore/data_path_original_impCLR.RData") == FALSE) {
bacteria_data_imp <- lrSVD(
bacteria_data,
label = 0,
dl = NULL,
z.delete = FALSE,
ncp = 2
)
row.names( bacteria_data_imp) <- row.names(bacteria_data)
bacteria_data <- data.frame(clr(bacteria_data_imp)) %>%
mutate(ID = row.names(.))
training_metadata <- read.xlsx("gitignore/data/lipidome_training_cohort.xlsx") %>%
select(Sample, Diet) %>%
mutate(ID = Sample) %>%
select(-Sample)
data_path_original_filtered <- data_path_original_filtered %>%
select(ID, Country) %>%
left_join(bacteria_data, by = "ID")
data_pathways_original_clr <- data_path_original_filtered %>%
mutate(
Data = if_else(Country == "CZ", "CZ_tr", "IT_tr")
) %>%
left_join(training_metadata, by = "ID") %>%
select(ID, Diet, Country, Data, everything())
save(
data_pathways_original_clr,
file = "gitignore/data_path_original_impCLR.RData"
)
}
load("gitignore/data_path_original_impCLR.RData")
if (file.exists("gitignore/data_pathways_original_impCLR.csv") == FALSE)
write.csv(data_pathways_original_clr,
"gitignore/data_pathways_original_impCLR.csv")
## Show variances of CLR proportions across samples
data_variance <- data_pathways_original_clr %>%
rowwise() %>%
mutate(variance = var(c_across(-(ID:Data)))) %>%
ungroup() %>%
select(ID, variance)
## Look at distribution
hist(data_variance$variance)Open code
## Show extreme samples
data_variance %>% arrange(desc(variance))
## # A tibble: 166 × 2
## ID variance
## <chr> <dbl>
## 1 T173 11.0
## 2 T181 9.32
## 3 T182 8.98
## 4 VOV004 8.57
## 5 VOV100 8.54
## 6 T176 8.49
## 7 VOV042 8.48
## 8 VOV088 8.48
## 9 VOV006 8.46
## 10 T189 8.45
## # ℹ 156 more rowsGet diet informationDiet` from another dataset for validating data
Open code
validation_metadata <- read.xlsx('gitignore/data/lipidome_validation_cohort.xlsx') %>%
select(X1, X2) %>%
mutate(ID = X1,
Diet = X2) %>%
select(-X1, -X2)
data_pathways_validation_filtered <- data_pathways_validation %>%
select(any_of(features))
set.seed(478)
data_pathways_validation_filtered <- (data_pathways_validation_filtered/
rowSums(data_pathways_validation_filtered ))
if (file.exists("gitignore/data_pathways_validation_impCLR.RData") == FALSE) {
data_pathways_validation_filtered_imp <- lrSVD(
data_pathways_validation_filtered,
label = 0, dl = NULL, z.delete = FALSE
)
row.names(data_pathways_validation_filtered_imp) <- row.names(data_pathways_validation_filtered)
data_pathways_validation_clr <- data.frame(
clr(data_pathways_validation_filtered_imp)
) %>%
mutate(
ID = row.names(.),
Country = "CZ",
Data = "valid"
) %>%
left_join(validation_metadata, by = "ID") %>%
select(ID, Diet, Country, Data, any_of(features)) %>%
filter(!is.na(Diet))
## Add Diet for K284 which has the diet missing
data_pathways_validation_clr[
which(data_pathways_validation_clr$ID == "K284"), "Diet"
] <- "VEGAN"
save(
data_pathways_validation_clr,
file = "gitignore/data_pathways_validation_impCLR.RData"
)
}
load("gitignore/data_pathways_validation_impCLR.RData")
if (file.exists("gitignore/data_pathways_validation_impCLR.csv") == FALSE)
write.csv(data_pathways_validation_clr,
"gitignore/data_pathways_validation_impCLR.csv")
## Show variances of CLR proportions across samples
data_variance <- data_pathways_validation_clr %>%
rowwise() %>%
mutate(variance = var(c_across(-(ID:Data)))) %>%
ungroup() %>%
select(ID, variance)
## Look at distribution
hist(data_variance$variance)Open code
## Show extreme samples
data_variance %>% arrange(desc(variance))
## # A tibble: 101 × 2
## ID variance
## <chr> <dbl>
## 1 K16 7.90
## 2 K55 7.38
## 3 K74 7.36
## 4 K119 7.29
## 5 K38 7.27
## 6 K12 7.22
## 7 K61 7.13
## 8 K292 7.10
## 9 K82 7.07
## 10 K64 7.02
## # ℹ 91 more rows3.1.6 Merge training and validation dataset
Open code
data_merged <- bind_rows(
data_pathways_original_clr,
data_pathways_validation_clr
)3.2 Explore
3.2.0.1 Distributions
The following plot will show distribution of 36 randomly selected pathways
Open code
size = c(6,6)
check <- data_pathways_original_clr[, 5:ncol(data_pathways_original_clr)]
check <- check[, sample(1:ncol(check), size[1]*size[2])]
par(mfrow = c(size[1],size[2]))
par(mar=c(2,1.5,2,0.5))
for(x in 1:ncol(check)){
hist(check[,x],
16,
col='blue',
main = paste0(colnames(check)[x])
)
}Data seems to have relatively symmetric distribution
3.2.0.2 Pathways accross groups
Open code
set.seed(478)
colo <- c('#F9FFAF','#329243')
outcomes <- data.frame(
variable = data_merged %>%
select(any_of(sample(features, 35))) %>%
colnames()
)
boxplot_cond <- function(variable) {
p <- ggboxplot(data_merged,
x = 'Diet',
y = variable,
fill = 'Diet',
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = 'Data',
title = variable,
ylab = 'pathway level') +
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
)
return(p)
}
# Plot all outcomes
plots <- map(outcomes$variable, boxplot_cond)
# Create a matrix of plots
plots_arranged <- ggarrange(plotlist = plots, ncol = 5, nrow = 7, common.legend = TRUE)
plots_arranged4 Linear models across pathways
We will fit a feature-specific linear model where the clr-transformed pathway represents the outcome variable whereas country (Italy vs Czech), diet (vegan vs omnivore), and their interaction (country:diet) all represent fixed-effects predictors. So, each model has following form
\[ clr(\text{pathway level}) = \alpha + \beta_{1} \times \text{country} + \beta_{2} \times \text{diet} + \beta_{3} \times \text{country:diet} + \epsilon \]
The variables were coded as follows: \(diet = -0.5\) for omnivores and \(diet = 0.5\) for vegans; \(country = -0.5\) for the Czech cohort and \(country = 0.5\) for the Italian cohort.
This parameterization allows us to interpret the linear model summary output as presenting the conditional effects of diet averaged across both countries and the conditional effects of country averaged across both diet groups. We will then use the emmeans package (Lenth 2024) to obtain specific estimates for the effect of diet in the Italian and Czech cohorts separately, still from a single model.
pathways that will show a significant diet effect (average effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.05) will be then visualized using a forest plot, with country-specific diet effect along with diet effect based on independent validation cohort, to evaluate how generalizable are these findings.
4.1 Select and wrangle data
Open code
data_analysis_pathways <- data_pathways_original_clr %>%
na.omit() %>%
dplyr::mutate(
Diet_VEGAN = as.numeric(
dplyr::if_else(
Diet == "VEGAN", 0.5, -0.5
)
),
Country_IT = as.numeric(
dplyr::if_else(
Country == "IT", 0.5, -0.5
)
),
) %>%
dplyr::select(
ID,
Country,
Country_IT,
Diet,
Diet_VEGAN,
dplyr::everything()
)
summary(data_analysis_pathways[ , 1:12])
## ID Country Country_IT Diet
## Length:166 Length:166 Min. :-0.50000 Length:166
## Class :character Class :character 1st Qu.:-0.50000 Class :character
## Mode :character Mode :character Median :-0.50000 Mode :character
## Mean :-0.03012
## 3rd Qu.: 0.50000
## Max. : 0.50000
## Diet_VEGAN Data X1CMET2_PWY ALLANTOINDEG_PWY
## Min. :-0.50000 Length:166 Min. :1.273 Min. :-6.0996
## 1st Qu.:-0.50000 Class :character 1st Qu.:2.422 1st Qu.:-4.9392
## Median : 0.50000 Mode :character Median :2.728 Median :-3.4441
## Mean : 0.08434 Mean :2.663 Mean :-3.6060
## 3rd Qu.: 0.50000 3rd Qu.:3.016 3rd Qu.:-2.3188
## Max. : 0.50000 Max. :3.741 Max. :-0.6658
## ANAEROFRUCAT_PWY ANAGLYCOLYSIS_PWY ARG.POLYAMINE_SYN ARGININE_SYN4_PWY
## Min. :0.1183 Min. :1.490 Min. :-2.72033 Min. :-2.4001
## 1st Qu.:1.5725 1st Qu.:2.407 1st Qu.: 0.07355 1st Qu.: 0.7401
## Median :1.9835 Median :2.770 Median : 0.37284 Median : 1.4195
## Mean :1.8992 Mean :2.710 Mean : 0.35348 Mean : 1.3716
## 3rd Qu.:2.2510 3rd Qu.:3.023 3rd Qu.: 0.77630 3rd Qu.: 2.1978
## Max. :3.1912 Max. :4.664 Max. : 1.97414 Max. : 3.0797
data_analysis_pathways[1:5 , 1:8]
## ID Country Country_IT Diet Diet_VEGAN Data X1CMET2_PWY ALLANTOINDEG_PWY
## 1 VOV002 IT 0.5 OMNI -0.5 IT_tr 2.186503 -5.783155
## 2 VOV003 IT 0.5 OMNI -0.5 IT_tr 2.008170 -1.685330
## 3 VOV004 IT 0.5 OMNI -0.5 IT_tr 3.418194 -4.651552
## 4 VOV006 IT 0.5 OMNI -0.5 IT_tr 3.174203 -4.576071
## 5 VOV007 IT 0.5 OMNI -0.5 IT_tr 1.479598 -2.2907164.1.1 Define number of pathways and covariates
Open code
n_covarites <- 6
n_features <- ncol(data_analysis_pathways) - n_covarites4.1.2 Create empty objects
Open code
outcome <- vector('double', n_features)
est_VGdiet_inCZ <- vector('double', n_features)
est_VGdiet_inIT <- vector('double', n_features)
est_VGdiet_avg <- vector('double', n_features)
est_ITcountry_avg <- vector('double', n_features)
diet_country_int <- vector('double', n_features)
P_VGdiet_inCZ <- vector('double', n_features)
P_VGdiet_inIT <- vector('double', n_features)
P_VGdiet_avg <- vector('double', n_features)
P_ITcountry_avg <- vector('double', n_features)
P_diet_country_int <- vector('double', n_features)
CI_L_VGdiet_inCZ <- vector('double', n_features)
CI_L_VGdiet_inIT <- vector('double', n_features)
CI_L_VGdiet_avg <- vector('double', n_features)
CI_U_VGdiet_inCZ <- vector('double', n_features)
CI_U_VGdiet_inIT <- vector('double', n_features)
CI_U_VGdiet_avg <- vector('double', n_features)4.2 Run linear models over pathways
Open code
for (i in 1:n_features) {
## define variable
data_analysis_pathways$outcome <- data_analysis_pathways[, (i + n_covarites)]
## fit model
model <- lm(outcome ~ Country_IT * Diet_VEGAN, data = data_analysis_pathways)
## get contrast (effects of diet BY COUNTRY)
contrast_emm <- summary(
pairs(
emmeans(
model,
specs = ~ Diet_VEGAN | Country_IT
),
interaction = TRUE,
adjust = "none"
),
infer = c(TRUE, TRUE)
)
## save results
outcome[i] <- names(data_analysis_pathways)[i + n_covarites]
## country effect
est_ITcountry_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT"
), 1
]
P_ITcountry_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT"
), 4
]
## diet effect
tr <- confint(model)
CI_L_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][1]
CI_U_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][2]
est_VGdiet_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_VEGAN"
), 1
]
P_VGdiet_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_VEGAN"
), 4
]
est_VGdiet_inCZ[i] <- -contrast_emm$estimate[1]
P_VGdiet_inCZ[i] <- contrast_emm$p.value[1]
CI_L_VGdiet_inCZ[i] <- -contrast_emm$upper.CL[1]
CI_U_VGdiet_inCZ[i] <- -contrast_emm$lower.CL[1]
est_VGdiet_inIT[i] <- -contrast_emm$estimate[2]
P_VGdiet_inIT[i] <- contrast_emm$p.value[2]
CI_L_VGdiet_inIT[i] <- -contrast_emm$upper.CL[2]
CI_U_VGdiet_inIT[i] <- -contrast_emm$lower.CL[2]
## interaction
diet_country_int[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
), 1
]
P_diet_country_int[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
), 4
]
}4.3 Results table
Open code
result_pathways <- data.frame(
outcome,
est_ITcountry_avg, P_ITcountry_avg,
est_VGdiet_avg, P_VGdiet_avg,
est_VGdiet_inCZ, P_VGdiet_inCZ,
est_VGdiet_inIT, P_VGdiet_inIT,
diet_country_int, P_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT
)4.3.1 Adjust p values
Open code
result_pathways <- result_pathways %>%
dplyr::mutate(
fdr_ITcountry_avg = p.adjust(P_ITcountry_avg, method = 'BH'),
fdr_VGdiet_avg = p.adjust(P_VGdiet_avg, method = 'BH'),
fdr_VGdiet_inCZ = p.adjust(P_VGdiet_inCZ, method = 'BH'),
fdr_VGdiet_inIT = p.adjust(P_VGdiet_inIT, method = 'BH'),
fdr_diet_country_int = p.adjust(P_diet_country_int, method = 'BH')
) %>%
dplyr::select(
outcome,
est_ITcountry_avg, P_ITcountry_avg, fdr_ITcountry_avg,
est_VGdiet_avg, P_VGdiet_avg, fdr_VGdiet_avg,
est_VGdiet_inCZ, P_VGdiet_inCZ, fdr_VGdiet_inCZ,
est_VGdiet_inIT, P_VGdiet_inIT, fdr_VGdiet_inIT,
diet_country_int, P_diet_country_int, fdr_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT
)4.3.2 Result: show and save
Open code
kableExtra::kable(result_pathways %>% filter(fdr_VGdiet_avg < 0.05),
caption = "Result of linear models, modelling CLR-transformed relative level of given fcuntional pathway with `Diet`, `Country` and `Diet x Country` interaction as fixed-effect predictors. Only the pathways that differ between vegans and omnivores in training cohorts (FDR < 0.05, average diet effect over both countries) are shown. `est` prefix: denotes estimated effects (regression coefficient), i.e. how much clr-transformed relative pathway level differ in vegans compared to omnivores, and in Italian vs Czech cohort, respectively; `P`: p-value, `fdr`: p-value after adjustment for multiple comparison, `CI_L` and `CI_U`: lower and upper bounds of 95% confidence interval respectively. `avg` suffix shows effect averaged across subgroups, whereas `inCZ` and `inIT` shows effect in Czech or Italian cohort respectively. Interaction effects are reported as `diet_country_int` (difference in the effect of vegan diet between Italian and Czech cohorts; positive values indicate a stronger effect in Italian, negative values a stronger effect in Czech cohort) and `P_diet_country_int` (its p-value) All estimates in a single row are based on a single model"
) | outcome | est_ITcountry_avg | P_ITcountry_avg | fdr_ITcountry_avg | est_VGdiet_avg | P_VGdiet_avg | fdr_VGdiet_avg | est_VGdiet_inCZ | P_VGdiet_inCZ | fdr_VGdiet_inCZ | est_VGdiet_inIT | P_VGdiet_inIT | fdr_VGdiet_inIT | diet_country_int | P_diet_country_int | fdr_diet_country_int | CI_L_VGdiet_avg | CI_U_VGdiet_avg | CI_L_VGdiet_inCZ | CI_U_VGdiet_inCZ | CI_L_VGdiet_inIT | CI_U_VGdiet_inIT |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CALVIN_PWY | -0.0296002 | 0.7062271 | 0.7756810 | 0.2613223 | 0.0010627 | 0.0250528 | 0.3326251 | 0.0031400 | 0.0456284 | 0.1900196 | 0.0882619 | 0.9858171 | -0.1426055 | 0.3644016 | 0.5988432 | 0.1065206 | 0.4161240 | 0.1135788 | 0.5516714 | -0.0287793 | 0.4088185 |
| GALACTITOLCAT_PWY | -0.7049222 | 0.0030045 | 0.0114303 | -0.9807686 | 0.0000454 | 0.0022705 | -1.3052878 | 0.0001197 | 0.0038094 | -0.6562494 | 0.0489003 | 0.9858171 | 0.6490384 | 0.1673536 | 0.5246896 | -1.4428039 | -0.5187332 | -1.9590736 | -0.6515020 | -1.3092967 | -0.0032021 |
| HEXITOLDEGSUPER_PWY | -0.6345150 | 0.0033358 | 0.0122899 | -0.9086781 | 0.0000337 | 0.0019655 | -1.2010968 | 0.0001017 | 0.0035588 | -0.6162595 | 0.0422636 | 0.9858171 | 0.5848373 | 0.1716696 | 0.5246896 | -1.3292685 | -0.4880877 | -1.7962374 | -0.6059562 | -1.2107279 | -0.0217911 |
| HSERMETANA_PWY | 0.0746296 | 0.3489152 | 0.4504382 | 0.2473730 | 0.0021839 | 0.0388779 | 0.3473933 | 0.0023538 | 0.0446191 | 0.1473527 | 0.1912725 | 0.9858171 | -0.2000406 | 0.2098315 | 0.5283527 | 0.0904967 | 0.4042494 | 0.1254113 | 0.5693753 | -0.0743785 | 0.3690840 |
| NONOXIPENT_PWY | -0.1079297 | 0.1953293 | 0.3065706 | 0.3293683 | 0.0001085 | 0.0047487 | 0.3660825 | 0.0021621 | 0.0446191 | 0.2926541 | 0.0136120 | 0.9858171 | -0.0734285 | 0.6588361 | 0.8034587 | 0.1654654 | 0.4932712 | 0.1341578 | 0.5980072 | 0.0609913 | 0.5243168 |
| PROPFERM_PWY | -1.1733941 | 0.0000000 | 0.0000000 | -0.5452806 | 0.0023327 | 0.0388779 | -1.1841185 | 0.0000045 | 0.0002256 | 0.0935574 | 0.7077896 | 0.9858171 | 1.2776759 | 0.0003881 | 0.0445874 | -0.8933922 | -0.1971690 | -1.6767008 | -0.6915363 | -0.3984685 | 0.5855832 |
| PWY_1269 | 0.2494440 | 0.0568017 | 0.1234820 | 0.4002225 | 0.0024462 | 0.0389173 | 0.3686893 | 0.0467370 | 0.1237939 | 0.4317557 | 0.0200073 | 0.9858171 | 0.0630665 | 0.8086764 | 0.8985293 | 0.1434741 | 0.6569709 | 0.0053870 | 0.7319915 | 0.0688638 | 0.7946476 |
| PWY_5104 | -0.0029780 | 0.9880447 | 0.9965869 | -0.8870113 | 0.0000146 | 0.0018502 | -0.8735792 | 0.0022026 | 0.0446191 | -0.9004434 | 0.0015980 | 0.2796561 | -0.0268642 | 0.9461154 | 0.9714199 | -1.2788612 | -0.4951614 | -1.4280518 | -0.3191067 | -1.4542896 | -0.3465972 |
| PWY_5367 | -0.1763380 | 0.3098164 | 0.4170606 | -0.8229389 | 0.0000044 | 0.0015281 | -1.3000263 | 0.0000004 | 0.0001260 | -0.3458515 | 0.1593626 | 0.9858171 | 0.9541748 | 0.0065146 | 0.2672971 | -1.1647303 | -0.4811475 | -1.7836654 | -0.8163873 | -0.8289442 | 0.1372413 |
| PWY_5747 | 0.3801128 | 0.1486819 | 0.2513565 | -0.8724046 | 0.0010737 | 0.0250528 | -0.8978253 | 0.0165274 | 0.0736276 | -0.8469838 | 0.0234498 | 0.9858171 | 0.0508415 | 0.9228098 | 0.9699202 | -1.3896726 | -0.3551365 | -1.6297661 | -0.1658845 | -1.5780978 | -0.1158697 |
| PWY_5971 | 0.2851474 | 0.1681036 | 0.2736570 | -0.7492655 | 0.0003689 | 0.0117385 | -1.2273875 | 0.0000420 | 0.0016315 | -0.2711434 | 0.3530071 | 0.9858171 | 0.9562441 | 0.0215055 | 0.3516863 | -1.1559662 | -0.3425647 | -1.8028741 | -0.6519009 | -0.8459800 | 0.3036931 |
| PWY_6284 | -0.0763172 | 0.6416310 | 0.7174788 | -0.7170244 | 0.0000211 | 0.0018502 | -1.1472182 | 0.0000018 | 0.0002256 | -0.2868306 | 0.2167977 | 0.9858171 | 0.8603876 | 0.0094022 | 0.2672971 | -1.0402209 | -0.3938279 | -1.6045453 | -0.6898911 | -0.7436411 | 0.1699800 |
| PWY_6293 | -0.3991880 | 0.0295207 | 0.0743327 | -0.7979908 | 0.0000204 | 0.0018502 | -1.2508209 | 0.0000027 | 0.0002256 | -0.3451607 | 0.1810402 | 0.9858171 | 0.9056603 | 0.0137467 | 0.2672971 | -1.1569706 | -0.4390110 | -1.7587819 | -0.7428600 | -0.8525478 | 0.1622265 |
| PWY_6629 | -0.1491528 | 0.0501929 | 0.1140748 | 0.2410823 | 0.0017135 | 0.0374820 | 0.3128749 | 0.0039398 | 0.0492480 | 0.1692897 | 0.1150530 | 0.9858171 | -0.1435852 | 0.3436827 | 0.5866151 | 0.0918025 | 0.3903621 | 0.1016422 | 0.5241076 | -0.0417044 | 0.3802838 |
| PWY_7237 | -0.2382669 | 0.0372708 | 0.0893478 | 0.3820200 | 0.0009489 | 0.0250528 | 0.5125117 | 0.0016953 | 0.0423814 | 0.2515284 | 0.1187112 | 0.9858171 | -0.2609833 | 0.2517765 | 0.5640539 | 0.1579766 | 0.6060635 | 0.1954874 | 0.8295360 | -0.0651378 | 0.5681946 |
| PWY_801 | -0.4000136 | 0.0326876 | 0.0817191 | -0.7960755 | 0.0000309 | 0.0019655 | -1.2545161 | 0.0000040 | 0.0002256 | -0.3376348 | 0.2000867 | 0.9858171 | 0.9168813 | 0.0145845 | 0.2686626 | -1.1627289 | -0.4294220 | -1.7733353 | -0.7356969 | -0.8558680 | 0.1805984 |
| PWY_8178 | -0.0688124 | 0.4122732 | 0.5197219 | 0.3181576 | 0.0002039 | 0.0079305 | 0.3622281 | 0.0026075 | 0.0446191 | 0.2740872 | 0.0217826 | 0.9858171 | -0.0881408 | 0.5992903 | 0.7464470 | 0.1528518 | 0.4834635 | 0.1283182 | 0.5961380 | 0.0404415 | 0.5077329 |
| PWY_8188 | -1.1733941 | 0.0000000 | 0.0000000 | -0.5452806 | 0.0023327 | 0.0388779 | -1.1841185 | 0.0000045 | 0.0002256 | 0.0935574 | 0.7077896 | 0.9858171 | 1.2776759 | 0.0003881 | 0.0445874 | -0.8933922 | -0.1971690 | -1.6767008 | -0.6915363 | -0.3984685 | 0.5855832 |
| PWY_8189 | -1.1733941 | 0.0000000 | 0.0000000 | -0.5452806 | 0.0023327 | 0.0388779 | -1.1841185 | 0.0000045 | 0.0002256 | 0.0935574 | 0.7077896 | 0.9858171 | 1.2776759 | 0.0003881 | 0.0445874 | -0.8933922 | -0.1971690 | -1.6767008 | -0.6915363 | -0.3984685 | 0.5855832 |
| PWY0_1586 | 0.0341764 | 0.6869672 | 0.7608813 | 0.2681826 | 0.0018360 | 0.0377992 | 0.3571433 | 0.0033129 | 0.0456284 | 0.1792219 | 0.1361326 | 0.9858171 | -0.1779213 | 0.2949060 | 0.5689951 | 0.1010066 | 0.4353587 | 0.1205871 | 0.5936995 | -0.0570671 | 0.4155109 |
| PWY0_42 | 0.3942904 | 0.0154267 | 0.0440852 | -0.5433769 | 0.0009276 | 0.0250528 | -0.4880001 | 0.0337446 | 0.1072755 | -0.5987536 | 0.0093549 | 0.9858171 | -0.1107535 | 0.7314183 | 0.8512073 | -0.8614198 | -0.2253340 | -0.9380348 | -0.0379654 | -1.0482800 | -0.1492273 |
| TRPSYN_PWY | -0.2950693 | 0.0005635 | 0.0028177 | 0.3120611 | 0.0002732 | 0.0095630 | 0.3618949 | 0.0026771 | 0.0446191 | 0.2622274 | 0.0283480 | 0.9858171 | -0.0996675 | 0.5531917 | 0.7251576 | 0.1464543 | 0.4776680 | 0.1275591 | 0.5962306 | 0.0281564 | 0.4962985 |
Open code
if(file.exists('gitignore/lm_results/result_pathways_filt.csv') == FALSE){
write.table(result_pathways,
'gitignore/lm_results/result_pathways_filt.csv',
row.names = FALSE)
}5 Elastic net
To assess the predictive power of pathways features on diet strategy, we employed Elastic Net logistic regression.
As we expected very high level of co-linearity, we allowed \(alpha\) to rather small (0, 0.2 or 0.4).
The performance of the predictive models was evaluated through their capacity of discriminate between vegan and omnivore diets, using out-of-sample area under ROC curve (AUC; estimated with out-of-bag bootstrap) as the measure of discriminatory capacity.
All features were transformed by 2 standard deviations (resulting in standard deviation of 0.5)
5.1 Prepare data for glmnet
Open code
data_pathways_glmnet <- data_pathways_original_clr %>%
dplyr::mutate(
vegan = as.numeric(
dplyr::if_else(
Diet == "VEGAN", 1, 0
)
)
) %>%
dplyr::mutate(
dplyr::across(all_of(features), ~ arm::rescale(.))
) %>%
dplyr::select(
ID, vegan, Country, all_of(features)
)5.2 Fit model
Open code
modelac <- "elanet_pathways_filt"
assign(
modelac,
run(
expr = clust_glmnet_sep(
data = data_pathways_glmnet,
outcome = "vegan",
clust_id = "ID",
sample_method = "oos_boot",
N = 500,
alphas = c(0, 0.2, 0.4),
family = "binomial",
seed = 478
),
path = paste0("gitignore/run/", modelac)
)
)5.3 See results
5.3.1 Model summary
Open code
elanet_pathways_filt$model_summary
## alpha lambda auc auc_OutOfSample auc_oos_CIL auc_oos_CIU accuracy
## 1 0.4 0.04303069 0.9412819 0.7897421 0.6777675 0.8811993 0.873494
## accuracy_OutOfSample accuracy_oos_CIL accuracy_oos_CIU
## 1 0.7252609 0.6105935 0.8321154
elanet_pathways_filt$country_AUC
## auc_OutOfSample_IT auc_oos_CIL_IT auc_oos_CIU_IT auc_OutOfSample_CZ
## 1 0.6698679 0.4822149 0.8389606 0.8739966
## auc_oos_CIL_CZ auc_oos_CIU_CZ
## 1 0.7339286 0.99363125.3.2 ROC curve - internal validation
Open code
elanet_pathways_filt$plot5.3.3 Estimated coefficients
Open code
tr <- data.frame(
label = row.names(
elanet_pathways_filt$betas
)[
which(
abs(
elanet_pathways_filt$betas
)>0
)
],
beta = elanet_pathways_filt$betas[
abs(
elanet_pathways_filt$betas
)>0
]
)[-1, ]
tr$pathway <- attr(data_path_originalCZ,
"description")[
colnames(data_path_originalCZ) %in% tr$label]
kableExtra::kable(tr %>% select(label, pathway, beta))| label | pathway | beta | |
|---|---|---|---|
| 2 | CENTFERM_PWY | pyruvate fermentation to butanoate | -0.1635717 |
| 3 | FUCCAT_PWY | fucose degradation | 0.2839196 |
| 4 | GALACTITOLCAT_PWY | galactitol degradation | -0.1996850 |
| 5 | GLUCARDEG_PWY | D-glucarate degradation I | 0.2602967 |
| 6 | GOLPDLCAT_PWY | superpathway of glycerol degradation to 1,3-propanediol | 0.1236880 |
| 7 | HEXITOLDEGSUPER_PWY | superpathway of hexitol degradation (bacteria) | -0.2176896 |
| 8 | HSERMETANA_PWY | L-methionine biosynthesis III | 0.1004888 |
| 9 | LACTOSECAT_PWY | lactose and galactose degradation I | -0.0286681 |
| 10 | NONOXIPENT_PWY | pentose phosphate pathway (non-oxidative branch) I | 0.3841247 |
| 11 | PPGPPMET_PWY | ppGpp metabolism | 0.2606652 |
| 12 | PROPFERM_PWY | superpathway of L-alanine fermentation (Stickland reaction) | -0.1535219 |
| 13 | PWY_5104 | L-isoleucine biosynthesis IV | -0.2145823 |
| 14 | PWY_5136 | fatty acid β-oxidation II (plant peroxisome) | 0.1595753 |
| 15 | PWY_5345 | superpathway of L-methionine biosynthesis (by sulfhydrylation) | 0.0000036 |
| 16 | PWY_5367 | petroselinate biosynthesis | -0.3153989 |
| 17 | PWY_5494 | pyruvate fermentation to propanoate II (acrylate pathway) | -0.0597671 |
| 18 | PWY_5747 | 2-methylcitrate cycle II | -0.0426507 |
| 19 | PWY_622 | starch biosynthesis | -0.0278495 |
| 20 | PWY_6284 | superpathway of unsaturated fatty acids biosynthesis (E. coli) | -0.1552727 |
| 21 | PWY_6293 | superpathway of L-cysteine biosynthesis (fungi) | -0.2893336 |
| 22 | PWY_6470 | peptidoglycan biosynthesis V (β-lactam resistance) | -0.4052525 |
| 23 | PWY_6572 | chondroitin sulfate degradation I (bacterial) | 0.1353945 |
| 24 | PWY_6590 | superpathway of Clostridium acetobutylicum acidogenic fermentation | -0.1394431 |
| 25 | PWY_6807 | xyloglucan degradation II (exoglucanase) | 0.1192990 |
| 26 | PWY_6892 | thiazole component of thiamine diphosphate biosynthesis I | 0.0488946 |
| 27 | PWY_6895 | superpathway of thiamine diphosphate biosynthesis II | -0.2609380 |
| 28 | PWY_6936 | seleno-amino acid biosynthesis (plants) | 0.2567455 |
| 29 | PWY_6992 | 1,5-anhydrofructose degradation | -0.0422256 |
| 30 | PWY_7111 | pyruvate fermentation to isobutanol (engineered) | -0.0626192 |
| 31 | PWY_7185 | UTP and CTP dephosphorylation I | -0.1626717 |
| 32 | PWY_7210 | pyrimidine deoxyribonucleotides biosynthesis from CTP | -0.1586225 |
| 33 | PWY_7234 | inosine-5’-phosphate biosynthesis III | -0.0885173 |
| 34 | PWY_7237 | myo-, chiro- and scyllo-inositol degradation | 0.1260601 |
| 35 | PWY_7312 | dTDP-β-D-fucofuranose biosynthesis | -0.1364504 |
| 36 | PWY_7400 | L-arginine biosynthesis IV (archaebacteria) | -0.0556371 |
| 37 | PWY_7858 | (5Z)-dodecenoate biosynthesis II | 0.1653035 |
| 38 | PWY_801 | homocysteine and cysteine interconversion | -0.2465140 |
| 39 | PWY_8178 | pentose phosphate pathway (non-oxidative branch) II | 0.2210654 |
| 40 | PWY_8188 | L-alanine degradation VI (reductive Stickland reaction) | -0.1546633 |
| 41 | PWY_8189 | L-alanine degradation V (oxidative Stickland reaction) | -0.1541587 |
| 42 | PWY0_1477 | ethanolamine utilization | -0.0720184 |
| 43 | PWY0_42 | 2-methylcitrate cycle I | -0.0442314 |
| 44 | PWY66_389 | phytol degradation | 0.0854299 |
| 45 | PWY66_391 | fatty acid β-oxidation VI (mammalian peroxisome) | -0.0083984 |
| 46 | REDCITCYC | TCA cycle VI (Helicobacter) | 0.0710738 |
| 47 | SO4ASSIM_PWY | assimilatory sulfate reduction I | 0.0312470 |
| 48 | SULFATE_CYS_PWY | superpathway of sulfate assimilation and cysteine biosynthesis | 0.0208325 |
| 49 | TRPSYN_PWY | L-tryptophan biosynthesis | 0.1046697 |
| 50 | UDPNAGSYN_PWY | UDP-N-acetyl-D-glucosamine biosynthesis I | -0.0136180 |
5.3.4 Plot of coefficients
Open code
elacoef <- data.frame(
pathway = row.names(elanet_pathways_filt$betas),
beta_ela = elanet_pathways_filt$betas[, 1]
) %>%
arrange(abs(beta_ela)) %>%
filter(abs(beta_ela) > 0,
!grepl('Intercept', pathway)) %>%
mutate(pathway = factor(pathway, levels = pathway))
plotac <- "elanet_beta_pathway"
path <- "gitignore/figures"
assign(plotac,
ggplot(elacoef,
aes(
x = pathway,
y = beta_ela
)
) +
geom_point() +
geom_hline(yintercept = 0, color = "black") +
labs(
y = "Standardized beta coefficients",
x = "pathway"
) +
theme_minimal() +
coord_flip() +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 7,
height = 12
)
}
get(plotac)6 External validation
External validation was performed with an independent Czech cohort.
As a first step, we will use the previously developed and internally validated elastic net model to predict vegan status in the independent Czech cohort. The validation data will be standardized using the mean and standard deviation of each pathway level from the training cohort to ensure comparability across datasets. For each subject in the external validation cohort, we will estimate the predicted probability of being vegan using the elastic net model. This predicted probability will then be used as a variable to discriminate between the diet groups in the independent cohort.
In a 2nd step, we will look at pathways that significantly differed between diet groups (average vegan diet effect across both countries, FDR < 0.05) estimated with linear models (one per pathway) with training cohort. Then we will fit linear models also for external validation cohort. Effect of vegan diet on these pathways will be shown along with 95% confidence interval for all cohorts: training Czech and Italian cohorts, but also in Czech independent (validating) cohort
6.1 Diet discrimination (elastic net)
6.1.1 Get table of weights, means and SDs
Open code
coefs_pathways <- get_coef(
original_data = data_analysis_pathways,
glmnet_model = elanet_pathways_filt)6.1.3 Standardize data in validation set
Open code
data_pathways_validation_pred <- data_pathways_validation_clr %>%
dplyr::mutate(
vegan = if_else(
Diet == 'VEGAN', 1, 0
)
) %>%
dplyr::select(
vegan,
dplyr::all_of(common_predictors)
) %>%
dplyr::mutate(
across(
.cols = -vegan,
.fns = ~ .
- coefs_pathways$mean[
match(
cur_column(),
coefs_pathways$predictor
)
]
)
) %>%
dplyr::mutate(
across(
.cols = -vegan,
.fns = ~ .
/ coefs_pathways$SD[
match(
cur_column(),
coefs_pathways$predictor
)
]
)
)6.1.4 Get predicted value
Open code
elanet_pathways_filt$fit
##
## Call: glmnet::glmnet(x = original_predictors, y = original_outcome, family = family, alpha = optim_par$alpha[1], lambda = optim_par$lamb_1se[1], standardize = standardize)
##
## Df %Dev Lambda
## 1 49 37.63 0.04303
newx <- as.matrix(data_pathways_validation_pred[,-1])
tr <- data_pathways_validation_pred %>%
dplyr::mutate(
predicted_logit = as.numeric(
predict(
elanet_pathways_filt$fit,
newx = newx
)
)
) %>%
dplyr::mutate(
predicted = inv_logit(predicted_logit)
)6.2 Result of external validation
6.2.1 ROC curve
Open code
roc_pathway <- pROC::roc(
vegan ~ predicted_logit,
data = tr,
direction = "<",
levels = c(0, 1),
ci = TRUE
)
plotac <- "roc_pathway_pl"
path <- "gitignore/figures"
assign(plotac, ggroc(roc_pathway))
get(plotac)Open code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 6,
height = 4.5
)
}6.2.2 Table
Open code
mod <- elanet_pathways_filt
trainAUC <- mean(mod[["valid_performances"]]$auc_resamp_test)
trainCI <- quantile(mod[["valid_performances"]]$auc_resamp_test,
probs = c(1 / 40, 39 / 40)
)
res <- data.frame(
alpha = c(mod$model_summary$alpha, rep("", 4)),
lambda = c(round(mod$model_summary$lambda, 4), rep("", 4)),
`performance type` = c(
"Training set AUC",
"Out-of-sample AUC (all)",
"Out-of-sample AUC (Czech)",
"Out-of-sample AUC (Italy)",
"External validation AUC"
),
`performance [95% CI]` = c(
sprintf("%.3f [%.3f to %.3f]", trainAUC, trainCI[1], trainCI[2]),
sprintf(
"%.3f [%.3f to %.3f]",
mod$model_summary$auc_OutOfSample,
mod$model_summary$auc_oos_CIL,
mod$model_summary$auc_oos_CIU
),
sprintf(
"%.3f [%.3f to %.3f]",
mod$country_AUC$auc_OutOfSample_CZ,
mod$country_AUC$auc_oos_CIL_CZ,
mod$country_AUC$auc_oos_CIU_CZ
),
sprintf(
"%.3f [%.3f to %.3f]",
mod$country_AUC$auc_OutOfSample_IT,
mod$country_AUC$auc_oos_CIL_IT,
mod$country_AUC$auc_oos_CIU_IT
),
sprintf(
"%.3f [%.3f to %.3f]",
roc_pathway[["ci"]][2],
roc_pathway[["ci"]][1],
roc_pathway[["ci"]][3]
)
)
)
kableExtra::kable(
res %>% mutate(across(where(is.numeric), ~ round(.x, 3))),
caption = "Performance of the elastic‑net logistic regression model for discriminating vegan from omnivore status using CLR-transformed pathways based on bacteria reads. The model was developed on the combined training data (Czech and Italian cohorts), with the optimmized `alpha` (mixing parameter) and `lambda` (penalty strength) selected via 10-fold cross-validation. Internal validation employed 500 out‑of‑bag bootstrap resamples: the out‑of‑sample AUC is the mean across resamples, and its 95 % confidence interval (CI) is given by the 2.5th and 97.5th percentiles of the bootstrap distribution. The training‑set AUC and its CI were computed analogously from the in‑bag predictions. External validation was carried out on an independent Czech cohort; the reported AUC is the point estimate on that cohort, and its 95% CI was obtained with DeLong’s method."
)| alpha | lambda | performance.type | performance..95..CI. |
|---|---|---|---|
| 0.4 | 0.043 | Training set AUC | 0.994 [0.975 to 1.000] |
| Out-of-sample AUC (all) | 0.790 [0.678 to 0.881] | ||
| Out-of-sample AUC (Czech) | 0.874 [0.734 to 0.994] | ||
| Out-of-sample AUC (Italy) | 0.670 [0.482 to 0.839] | ||
| External validation AUC | 0.774 [0.684 to 0.864] |
6.3 Diet effect across datasets (forest plot)
Similarly as in training data cohorts, we will fit linear model per each of the selected pathway level with a single fixed effect factor of diet.
6.3.1 Linear model in validation cohort
Open code
data_analysis_pathways <- data_pathways_validation_clr %>%
dplyr::mutate(
Diet_VEGAN = as.numeric(
dplyr::if_else(
Diet == 'VEGAN', 1, 0
)
) ) %>%
dplyr::select(
Diet_VEGAN,
dplyr::everything()
)
summary(data_analysis_pathways[, 1:12])
## Diet_VEGAN ID Diet Country
## Min. :0.0000 Length:101 Length:101 Length:101
## 1st Qu.:0.0000 Class :character Class :character Class :character
## Median :1.0000 Mode :character Mode :character Mode :character
## Mean :0.5743
## 3rd Qu.:1.0000
## Max. :1.0000
## Data X1CMET2_PWY ALLANTOINDEG_PWY ANAEROFRUCAT_PWY
## Length:101 Min. :1.820 Min. :-4.1711 Min. :1.139
## Class :character 1st Qu.:2.582 1st Qu.:-3.5853 1st Qu.:1.598
## Mode :character Median :2.730 Median :-3.3514 Median :1.842
## Mean :2.695 Mean :-2.8111 Mean :1.825
## 3rd Qu.:2.847 3rd Qu.:-2.0140 3rd Qu.:2.029
## Max. :3.105 Max. :-0.5367 Max. :2.748
## ANAGLYCOLYSIS_PWY ARG.POLYAMINE_SYN ARGININE_SYN4_PWY ARGSYN_PWY
## Min. :1.990 Min. :-1.4687 Min. :-1.4437 Min. :2.000
## 1st Qu.:2.777 1st Qu.: 0.2802 1st Qu.: 0.5786 1st Qu.:2.752
## Median :2.942 Median : 0.5856 Median : 0.9553 Median :2.907
## Mean :2.913 Mean : 0.5673 Mean : 0.9585 Mean :2.854
## 3rd Qu.:3.086 3rd Qu.: 0.8994 3rd Qu.: 1.4342 3rd Qu.:3.001
## Max. :3.541 Max. : 1.7341 Max. : 2.1724 Max. :3.4006.3.1.1 Define number of pathways and covariates
Open code
n_covarites <- 5
n_features <- ncol(data_analysis_pathways) - n_covarites6.3.1.2 Create empty objects
Open code
outcome <- vector('double', n_features)
est_VGdiet <- vector('double', n_features)
P_VGdiet <- vector('double', n_features)
CI_L_VGdiet <- vector('double', n_features)
CI_U_VGdiet <- vector('double', n_features)6.3.1.3 Linear models per outcome
Open code
for (i in 1:n_features) {
## define variable
data_analysis_pathways$outcome <- data_analysis_pathways[, (i + n_covarites)]
## fit model
model <- lm(outcome ~ Diet_VEGAN, data = data_analysis_pathways)
## save results
outcome[i] <- names(data_analysis_pathways)[i + n_covarites]
## extract diet effect
tr <- confint(model)
CI_L_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][1]
CI_U_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][2]
est_VGdiet[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_VEGAN"
), 1
]
P_VGdiet[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_VEGAN"
), 4
]
}6.3.1.4 Results table
Open code
## relevant pathways
diet_sensitive_pathways <- result_pathways %>%
filter(
fdr_VGdiet_avg < 0.05
) %>%
select(
outcome
)
result_pathways_val <- data.frame(
outcome,
est_VGdiet, P_VGdiet,
CI_L_VGdiet, CI_U_VGdiet
) %>%
filter(outcome %in% diet_sensitive_pathways$outcome)
tr <- attr(
data_path_originalCZ,
"description")[colnames(data_path_originalCZ) %in% result_pathways_val$outcome]
val_res <- cbind(pathway = tr, result_pathways_val) %>%
arrange(desc(est_VGdiet))
kableExtra::kable(val_res,
caption = 'Results of linear models estimating the effect of diet on the CLR-transformed proportion of a given functional pathway as the outcome. Only pathways whose proportion significantly differed between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. `est` represents the estimated effects (regression coefficient), indicating how much the CLR-transformed proportions differ between vegans and omnivores. `P`: p-value, `fdr`: p-value adjusted for multiple comparisons, and `CI_L` and `CI_U` represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model') | pathway | outcome | est_VGdiet | P_VGdiet | CI_L_VGdiet | CI_U_VGdiet |
|---|---|---|---|---|---|
| peptidoglycan maturation (meso-diaminopimelate containing) | PWY0_1586 | 0.1535083 | 0.0140132 | 0.0317355 | 0.2752811 |
| pentose phosphate pathway (non-oxidative branch) I | NONOXIPENT_PWY | 0.1334798 | 0.0209651 | 0.0205883 | 0.2463713 |
| L-tryptophan biosynthesis | TRPSYN_PWY | 0.1327050 | 0.0186200 | 0.0226470 | 0.2427630 |
| myo-, chiro- and scyllo-inositol degradation | PWY_7237 | 0.1258992 | 0.1422591 | -0.0429823 | 0.2947807 |
| pentose phosphate pathway (non-oxidative branch) II | PWY_8178 | 0.1132709 | 0.0398410 | 0.0053646 | 0.2211772 |
| superpathway of L-tryptophan biosynthesis | PWY_6629 | 0.0985685 | 0.0431588 | 0.0030925 | 0.1940445 |
| L-methionine biosynthesis III | HSERMETANA_PWY | 0.0880344 | 0.0798108 | -0.0106562 | 0.1867249 |
| Calvin-Benson-Bassham cycle | CALVIN_PWY | 0.0596844 | 0.2390520 | -0.0402962 | 0.1596651 |
| galactitol degradation | GALACTITOLCAT_PWY | 0.0535177 | 0.8777221 | -0.6348888 | 0.7419241 |
| superpathway of hexitol degradation (bacteria) | HEXITOLDEGSUPER_PWY | -0.0106956 | 0.9728494 | -0.6326706 | 0.6112795 |
| 2-methylcitrate cycle II | PWY_5747 | -0.0366123 | 0.7913669 | -0.3104904 | 0.2372658 |
| palmitate biosynthesis (type II fatty acid synthase) | PWY_5971 | -0.0392706 | 0.7701391 | -0.3052183 | 0.2266771 |
| 2-methylcitrate cycle I | PWY0_42 | -0.0517980 | 0.7101465 | -0.3275435 | 0.2239474 |
| CMP-3-deoxy-D-manno-octulosonate biosynthesis | PWY_1269 | -0.0634412 | 0.6797402 | -0.3674721 | 0.2405898 |
| L-isoleucine biosynthesis IV | PWY_5104 | -0.2815587 | 0.1150244 | -0.6329268 | 0.0698094 |
| L-alanine degradation V (oxidative Stickland reaction) | PWY_8189 | -0.4025018 | 0.0014258 | -0.6458656 | -0.1591379 |
| L-alanine degradation VI (reductive Stickland reaction) | PWY_8188 | -0.4025018 | 0.0014258 | -0.6458656 | -0.1591379 |
| superpathway of L-alanine fermentation (Stickland reaction) | PROPFERM_PWY | -0.4025018 | 0.0014258 | -0.6458656 | -0.1591379 |
| superpathway of L-cysteine biosynthesis (fungi) | PWY_6293 | -0.4259077 | 0.0020150 | -0.6923133 | -0.1595022 |
| homocysteine and cysteine interconversion | PWY_801 | -0.4273921 | 0.0020285 | -0.6949062 | -0.1598780 |
| superpathway of unsaturated fatty acids biosynthesis (E. coli) | PWY_6284 | -0.9274652 | 0.0000322 | -1.3497696 | -0.5051608 |
| petroselinate biosynthesis | PWY_5367 | -1.3126488 | 0.0000030 | -1.8382687 | -0.7870289 |
Open code
if(file.exists('gitignore/lm_results/result_pathways_validation_filt.csv') == FALSE){
write.table(val_res,
'gitignore/lm_results/result_pathways_validation_filt.csv',
row.names = FALSE)
}6.3.2 Forest plot
6.3.2.1 Data preparation
Open code
len <- nrow(diet_sensitive_pathways)
## subset result tables
result_pathways_subset <- result_pathways %>%
filter(outcome %in% diet_sensitive_pathways$outcome)
result_pathways_val_subset <- result_pathways_val %>%
filter(outcome %in% diet_sensitive_pathways$outcome)
## create a data frame
data_forest <- data.frame(
outcome = rep(diet_sensitive_pathways$outcome, 3),
beta = c(
result_pathways_subset$est_VGdiet_inCZ,
result_pathways_subset$est_VGdiet_inIT,
result_pathways_val_subset$est_VGdiet
),
lower = c(
result_pathways_subset$CI_L_VGdiet_inCZ,
result_pathways_subset$CI_L_VGdiet_inIT,
result_pathways_val_subset$CI_L_VGdiet
),
upper = c(
result_pathways_subset$CI_U_VGdiet_inCZ,
result_pathways_subset$CI_U_VGdiet_inIT,
result_pathways_val_subset$CI_U_VGdiet
),
dataset = c(
rep("CZ", len),
rep("IT", len),
rep("Validation", len)
)
)
validation_order <- data_forest %>%
left_join(
val_res %>% select(outcome, pathway),
by = 'outcome'
) %>%
group_by(outcome) %>%
summarise(beta_mean = mean(beta), .groups = "drop",
pathway = first(pathway)) %>%
arrange(beta_mean) %>%
pull(pathway)
up_winners <- data_forest %>%
pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(
elacoef %>% mutate(outcome = pathway) %>% select(-pathway),
by = 'outcome') %>%
filter(beta_CZ > 0,
beta_IT > 0,
lower_Validation > 0,
beta_ela > 0.1) %>%
select(outcome)
down_winners <- data_forest %>%
pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(elacoef %>% mutate(outcome = pathway) %>% select(-pathway),
by = 'outcome') %>%
filter(beta_CZ < 0,
beta_IT < 0,
upper_Validation < 0,
beta_ela < -0.1) %>%
select(outcome)
winners <- as.character(c(up_winners$outcome, down_winners$outcome))
data_forest <- data_forest %>%
mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
left_join(
val_res %>% select(outcome, pathway),
by = 'outcome'
) %>%
left_join(
elacoef %>% mutate(outcome = pathway) %>% select(-pathway),
by = 'outcome') %>%
mutate(pathway = factor(pathway, levels = validation_order))6.3.2.2 Plotting
Open code
plotac <- "forest_pathway"
path <- "gitignore/figures"
colors <- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")
assign(
plotac,
ggplot(data_forest,
aes(x = pathway,
y = beta,
ymin = lower,
ymax = upper,
color = dataset)) +
geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
scale_color_manual(values = colors) +
labs(
y = "Effect of vegan diet on clr-trasformed pathway",
x = "Outcome",
color = "Dataset"
) +
theme_minimal() +
coord_flip() +
scale_x_discrete(
labels = setNames(
ifelse(data_forest$in_winner,
paste0("**", data_forest$pathway, "**"),
as.character(data_forest$pathway)
), data_forest$pathway
)
) +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = ggtext::element_markdown(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)Diet, Country, and the interaction term Diet:Country as predictors. In the independent Czech validation cohort, Diet was the only fixed-effect predictor. Patways validated in the linear model and showing predictive power in the elastic net model (|β| > 0.1) are boldOpen code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 10,
height = 12
)
}6.3.3 Boxplot
Open code
plotac <- "boxplot_pathway"
path <- "gitignore/figures"
colo <- c("#F9FFAF", "#329243")
boxplot_cond <- function(variable) {
p <- ggboxplot(data_merged,
x = "Diet",
y = variable,
fill = "Diet",
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = "Data",
title = variable,
ylab = "CLR(pathway proportion)"
) +
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
)
return(p)
}
# Plot all outcomes
plots <- map(diet_sensitive_pathways$outcome, boxplot_cond)
# Create a matrix of plots
plots_arranged <- ggarrange(plotlist = plots, ncol = 4, nrow = 6,
common.legend = TRUE)
assign(plotac, plots_arranged)
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 9,
height = 13
)
}
get(plotac)7 Linear model VG duration
Next, we fit another series of linear models, this time modelling clr-transformed functional pathways (inferred from shotgun metagenomic sequencing) using the following fixed-effect predictors: duration of vegan status (Diet_duration, scaled in tens of years), Country, their interaction (Diet_duration × Country), and Age:
\[ \text{CLR(pathway proportion)} = \alpha + \beta_{1} \times \text{Country} + \beta_{2} \times \text{Diet duration} + \beta_{3} \times (\text{Country}:\text{Diet duration}) + \beta_{4} \times \text{Age} + \epsilon \]
This analysis includes only vegan participants, while omnivores are excluded. The aim was to test whether functional pathways that differ between vegans and omnivores also vary within the vegan group itself, depending on how long participants have been vegan. In other words, we asked whether long-term vegans show stronger up- or down-regulation of diet-sensitive pathways compared to those who adopted the diet more recently.
Because longer vegan duration is likely correlated with age (e.g. a 20-year-old cannot have 20 years of vegan history, unlike a 40- or 60-year-old), we also adjusted for age in the models.
7.1 Get data
7.1.1 Training
Open code
meta_trainIT <- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 1)
meta_trainCZ <- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 2) %>%
dplyr::mutate(ID = paste0('T', ID),
Sex = SEX) %>%
dplyr::select(-SEX)
meta_trainIT[1:5,]
## ID COHORT GRP Diet_duration Age Sex
## 1 VOV002 IT_train OM 0 61.4 F
## 2 VOV003 IT_train OM 0 43.7 F
## 3 VOV004 IT_train OM 0 61.1 F
## 4 VOV006 IT_train OM 0 31.7 F
## 5 VOV007 IT_train OM 0 31.8 F
meta_trainCZ[1:5,]
## ID COHORT GRP Diet_duration Age Sex
## 1 T97 CZ_train OM 0 26.76438 M
## 2 T98 CZ_train OM 0 27.61370 F
## 3 T134 CZ_train OM 0 21.64384 F
## 4 T136 CZ_train OM 0 31.25479 M
## 5 T137 CZ_train OM 0 40.42192 F
data_meta_original <- bind_rows(meta_trainIT, meta_trainCZ) %>%
rename(`Sample` = `ID`)
data_meta_original[1:5,]
## Sample COHORT GRP Diet_duration Age Sex
## 1 VOV002 IT_train OM 0 61.4 F
## 2 VOV003 IT_train OM 0 43.7 F
## 3 VOV004 IT_train OM 0 61.1 F
## 4 VOV006 IT_train OM 0 31.7 F
## 5 VOV007 IT_train OM 0 31.8 F
data_pathways_original2 <- data_pathways_original_clr %>%
rename(`Sample` = `ID`) %>%
left_join(data_meta_original, by = 'Sample') %>%
select(Sample:Diet, COHORT:Sex, everything())
data_pathways_original2 %>% dim()
## [1] 166 359
data_pathways_original2[
1:4,
(ncol(data_pathways_original2)-10):ncol(data_pathways_original2)
]
## SO4ASSIM_PWY SULFATE_CYS_PWY TCA_GLYOX_BYPASS TCA THISYN_PWY
## 1 -0.1880565 0.4369528 -0.2921062 0.1011286 1.643560
## 2 -0.4462730 0.1693518 -0.5477345 0.1521391 1.478927
## 3 -2.4927554 -1.6876045 -2.1987741 -2.0383007 2.849858
## 4 -2.2425268 -1.4401533 -2.7500641 -2.5363305 2.799085
## THISYNARA_PWY THRESYN_PWY TRNA_CHARGING_PWY TRPSYN_PWY UDPNAGSYN_PWY
## 1 1.438471 1.880293 2.420052 1.424845 1.358719
## 2 0.884228 1.814359 2.601992 1.624480 1.603390
## 3 2.317259 3.015584 3.856117 1.818168 2.435054
## 4 2.327101 3.044502 3.956278 2.064342 3.331618
## VALSYN_PWY
## 1 2.479489
## 2 2.416387
## 3 3.539453
## 4 3.409071
data_pathways_original2[1:4, 1:10]
## Sample Diet COHORT GRP Diet_duration Age Sex Country Data X1CMET2_PWY
## 1 VOV002 OMNI IT_train OM 0 61.4 F IT IT_tr 2.186503
## 2 VOV003 OMNI IT_train OM 0 43.7 F IT IT_tr 2.008170
## 3 VOV004 OMNI IT_train OM 0 61.1 F IT IT_tr 3.418194
## 4 VOV006 OMNI IT_train OM 0 31.7 F IT IT_tr 3.1742037.1.2 Validation
Open code
data_meta_valid <- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 3) %>%
rename(`Sample` = `ID`)
data_pathways_valid2 <- data_pathways_validation_clr %>%
rename(`Sample` = `ID`) %>%
left_join(data_meta_valid, by = 'Sample') %>%
select(Sample:Diet, COHORT:SEX, everything())
data_pathways_valid2 %>% dim()
## [1] 101 359
data_pathways_valid2[
1:4,
(ncol(data_pathways_valid2)-10):ncol(data_pathways_valid2)
]
## SO4ASSIM_PWY SULFATE_CYS_PWY TCA_GLYOX_BYPASS TCA THISYN_PWY
## 1 -0.8517185 -0.08304722 -3.365452 -3.033998 2.165097
## 2 -0.1495979 0.56899174 -3.314443 -2.782678 2.088872
## 3 -2.2760134 -1.47638331 -3.369634 -3.061441 2.328548
## 4 -3.5178606 -2.70918461 -3.528815 -3.179935 2.517770
## THISYNARA_PWY THRESYN_PWY TRNA_CHARGING_PWY TRPSYN_PWY UDPNAGSYN_PWY
## 1 2.410630 2.475191 3.147883 2.856516 2.047496
## 2 1.721339 2.516181 2.951190 2.587668 2.403907
## 3 2.304818 2.513833 3.060194 2.777382 2.104066
## 4 2.168090 2.770383 3.276645 3.078048 2.792120
## VALSYN_PWY
## 1 3.237373
## 2 3.172197
## 3 3.174123
## 4 3.403835
data_pathways_valid2[1:4, 1:10]
## Sample Diet COHORT GRP Diet_duration Age SEX Country Data X1CMET2_PWY
## 1 K4 VEGAN CZ_val VN <NA> 30.98904 M CZ valid 2.830535
## 2 K5 VEGAN CZ_val VN <NA> 30.66575 F CZ valid 2.582339
## 3 K7 VEGAN CZ_val VN <NA> 32.97808 F CZ valid 2.825255
## 4 K12 VEGAN CZ_val VN <NA> 27.89315 M CZ valid 3.022772
data_pathways_valid2 %>% select(Sample, Diet, GRP)
## Sample Diet GRP
## 1 K4 VEGAN VN
## 2 K5 VEGAN VN
## 3 K7 VEGAN VN
## 4 K12 VEGAN VN
## 5 K13 VEGAN VN
## 6 K15 VEGAN VN
## 7 K16 VEGAN VN
## 8 K18 VEGAN VN
## 9 K19 VEGAN VN
## 10 K25 VEGAN VN
## 11 K26 VEGAN VN
## 12 K31 VEGAN VN
## 13 K32 VEGAN VN
## 14 K34 VEGAN VN
## 15 K35 VEGAN VN
## 16 K38 VEGAN VN
## 17 K39 VEGAN VN
## 18 K42 OMNI OM
## 19 K43 OMNI OM
## 20 K45 VEGAN VN
## 21 K46 VEGAN VN
## 22 K48 VEGAN VN
## 23 K49 VEGAN VN
## 24 K55 VEGAN VN
## 25 K56 VEGAN VN
## 26 K61 VEGAN VN
## 27 K62 VEGAN VN
## 28 K64 VEGAN VN
## 29 K65 VEGAN VN
## 30 K73 VEGAN VN
## 31 K74 VEGAN VN
## 32 K82 VEGAN VN
## 33 K85 VEGAN VN
## 34 K103 VEGAN VN
## 35 K106 VEGAN VN
## 36 K107 VEGAN VN
## 37 K114 VEGAN VN
## 38 K117 VEGAN VN
## 39 K119 VEGAN VN
## 40 K120 VEGAN VN
## 41 K123 OMNI OM
## 42 K124 OMNI OM
## 43 K126 OMNI OM
## 44 K127 OMNI OM
## 45 K130 OMNI OM
## 46 K131 OMNI OM
## 47 K143 OMNI OM
## 48 K150 OMNI OM
## 49 K151 OMNI OM
## 50 K154 OMNI OM
## 51 K155 OMNI OM
## 52 K175 OMNI OM
## 53 K176 OMNI OM
## 54 K178 OMNI OM
## 55 K179 OMNI OM
## 56 K182 OMNI OM
## 57 K183 OMNI OM
## 58 K198 OMNI OM
## 59 K199 OMNI OM
## 60 K201 OMNI OM
## 61 K202 OMNI OM
## 62 K205 VEGAN VN
## 63 K206 VEGAN VN
## 64 K209 OMNI OM
## 65 K210 OMNI OM
## 66 K216 VEGAN VN
## 67 K217 VEGAN VN
## 68 K219 VEGAN VN
## 69 K220 VEGAN VN
## 70 K222 OMNI OM
## 71 K223 OMNI OM
## 72 K226 VEGAN VN
## 73 K227 VEGAN VN
## 74 K230 OMNI OM
## 75 K231 OMNI OM
## 76 K234 OMNI OM
## 77 K235 OMNI OM
## 78 K238 OMNI OM
## 79 K239 OMNI OM
## 80 K250 VEGAN VN
## 81 K251 VEGAN VN
## 82 K253 OMNI OM
## 83 K254 OMNI OM
## 84 K258 VEGAN VN
## 85 K260 VEGAN VN
## 86 K261 VEGAN VN
## 87 K263 VEGAN VN
## 88 K264 VEGAN VN
## 89 K266 OMNI OM
## 90 K267 OMNI OM
## 91 K270 OMNI OM
## 92 K281 VEGAN VN
## 93 K285 VEGAN VN
## 94 K289 VEGAN VN
## 95 K291 VEGAN VN
## 96 K292 VEGAN VN
## 97 K298 OMNI OM
## 98 K299 OMNI OM
## 99 K318 OMNI OM
## 100 K329 OMNI OM
## 101 K330 OMNI OM
data_pathways_valid2 %>% select(Sample, Diet, GRP, Diet_duration) %>%
filter(
(Diet == 'VEGAN' & GRP == 'OM') | (Diet == 'OMNI' & GRP == 'VN')
)
## [1] Sample Diet GRP Diet_duration
## <0 rows> (or 0-length row.names)7.2 Training cohort
7.2.1 Select data
Open code
data_analysis <- data_pathways_original2 %>%
mutate(Country_IT = as.numeric(
dplyr::if_else(
Country == "IT", 0.5, -0.5
)
)
) %>%
filter(Diet == 'VEGAN',
!is.na(Diet_duration)) %>%
select(1:8, Country_IT, all_of(diet_sensitive_pathways$outcome))
summary(data_analysis[ , 1:12])
## Sample Diet COHORT GRP
## Length:96 Length:96 Length:96 Length:96
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Diet_duration Age Sex Country
## Min. : 0.600 Min. :18.25 Length:96 Length:96
## 1st Qu.: 3.150 1st Qu.:27.57 Class :character Class :character
## Median : 5.000 Median :32.15 Mode :character Mode :character
## Mean : 5.492 Mean :34.97
## 3rd Qu.: 6.000 3rd Qu.:40.77
## Max. :20.000 Max. :60.70
## Country_IT CALVIN_PWY GALACTITOLCAT_PWY HEXITOLDEGSUPER_PWY
## Min. :-0.5000 Min. :1.496 Min. :-5.9819 Min. :-4.7341
## 1st Qu.:-0.5000 1st Qu.:2.395 1st Qu.:-4.9624 1st Qu.:-3.7146
## Median :-0.5000 Median :2.764 Median :-4.1624 Median :-2.9213
## Mean :-0.1042 Mean :2.698 Mean :-3.7681 Mean :-2.6037
## 3rd Qu.: 0.5000 3rd Qu.:2.928 3rd Qu.:-2.8106 3rd Qu.:-1.6531
## Max. : 0.5000 Max. :4.289 Max. : 0.3174 Max. : 0.8302
data_analysis[1:10, 1:12]
## Sample Diet COHORT GRP Diet_duration Age Sex Country Country_IT
## 1 VOV010 VEGAN IT_train VG 2.6 25.7 M IT 0.5
## 2 VOV012 VEGAN IT_train VG 3.0 55.7 F IT 0.5
## 3 VOV014 VEGAN IT_train VG 2.0 35.7 F IT 0.5
## 4 VOV018 VEGAN IT_train VG 3.0 33.5 F IT 0.5
## 5 VOV019 VEGAN IT_train VG 2.0 20.3 F IT 0.5
## 6 VOV020 VEGAN IT_train VG 5.0 24.7 M IT 0.5
## 7 VOV021 VEGAN IT_train VG 0.6 18.5 F IT 0.5
## 8 VOV029 VEGAN IT_train VG 7.0 52.3 F IT 0.5
## 9 VOV043 VEGAN IT_train VG 5.0 29.8 F IT 0.5
## 10 VOV044 VEGAN IT_train VG 5.0 32.2 F IT 0.5
## CALVIN_PWY GALACTITOLCAT_PWY HEXITOLDEGSUPER_PWY
## 1 2.011687 -3.3527234 -2.1492342
## 2 2.051551 -5.2216023 -3.9759989
## 3 2.105309 -5.8716201 -4.6238101
## 4 2.564119 -5.3142270 -4.0664170
## 5 2.460516 -2.8677805 -1.6570946
## 6 1.495842 -2.6777707 -1.4943911
## 7 2.047515 -0.4930437 0.1209159
## 8 2.789296 -4.8926198 -3.6448097
## 9 3.165458 -3.0580872 -1.8510735
## 10 2.246076 -4.0299535 -2.82318307.2.2 Define number of pathways and covariates
Open code
n_covarites <- 9
n_features <- ncol(data_analysis) - n_covarites7.2.3 Create empty objects
Open code
outcome <- vector('double', n_features)
est_Diet_duration_inCZ <- vector('double', n_features)
est_Diet_duration_inIT <- vector('double', n_features)
est_Diet_duration_avg <- vector('double', n_features)
est_ITcountry_avg <- vector('double', n_features)
diet_country_int <- vector('double', n_features)
P_Diet_duration_inCZ <- vector('double', n_features)
P_Diet_duration_inIT <- vector('double', n_features)
P_Diet_duration_avg <- vector('double', n_features)
P_ITcountry_avg <- vector('double', n_features)
est_Age <- vector('double', n_features)
P_Age <- vector('double', n_features)
P_diet_country_int <- vector('double', n_features)
CI_L_Diet_duration_inCZ <- vector('double', n_features)
CI_L_Diet_duration_inIT <- vector('double', n_features)
CI_L_Diet_duration_avg <- vector('double', n_features)
CI_U_Diet_duration_inCZ <- vector('double', n_features)
CI_U_Diet_duration_inIT <- vector('double', n_features)
CI_U_Diet_duration_avg <- vector('double', n_features)7.2.4 Estimate over outcomes
Open code
if(file.exists('gitignore/lm_results/result_pathways_SGB30_VGdur_training.Rds') == FALSE){
for (i in 1:n_features) {
## define variable
data_analysis$outcome <- data_analysis[, (i + n_covarites)]
## fit model
model <- lm(outcome ~ Country_IT * Diet_duration + Age,
data = data_analysis %>%
mutate(Diet_duration = (Diet_duration/10),
Age = (Age-33)/10))
## get contrast (effects of diet BY COUNTRY)
contrast_emm <- summary(
pairs(
emmeans(
model,
specs = ~ Diet_duration | Country_IT,
at = list(Diet_duration = c(0, 1))
),
interaction = TRUE,
adjust = "none"
),
infer = c(TRUE, TRUE)
)
## save results
outcome[i] <- names(data_analysis)[i + n_covarites]
## country and age effect
est_ITcountry_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT"
), 1
]
est_Age[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Age"
), 1
]
P_Age[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Age"
), 4
]
## diet effect
tr <- confint(model)
CI_L_Diet_duration_avg[i] <- tr[which(row.names(tr) == 'Diet_duration'),][1]
CI_U_Diet_duration_avg[i] <- tr[which(row.names(tr) == 'Diet_duration'),][2]
est_Diet_duration_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_duration"
), 1
]
P_Diet_duration_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_duration"
), 4
]
est_Diet_duration_inCZ[i] <- -contrast_emm[1,3]
P_Diet_duration_inCZ[i] <- contrast_emm$p.value[1]
CI_L_Diet_duration_inCZ[i] <- -contrast_emm$upper.CL[1]
CI_U_Diet_duration_inCZ[i] <- -contrast_emm$lower.CL[1]
est_Diet_duration_inIT[i] <- -contrast_emm[2,3]
P_Diet_duration_inIT[i] <- contrast_emm$p.value[2]
CI_L_Diet_duration_inIT[i] <- -contrast_emm$upper.CL[2]
CI_U_Diet_duration_inIT[i] <- -contrast_emm$lower.CL[2]
## interaction
diet_country_int[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT:Diet_duration"
), 1
]
P_diet_country_int[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT:Diet_duration"
), 4
]
}
result_pathways <- data.frame(
outcome,
est_ITcountry_avg, P_ITcountry_avg,
est_Age, P_Age,
est_Diet_duration_avg, P_Diet_duration_avg,
est_Diet_duration_inCZ, P_Diet_duration_inCZ,
est_Diet_duration_inIT, P_Diet_duration_inIT,
diet_country_int, P_diet_country_int,
CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT
)
saveRDS(result_pathways,
'gitignore/lm_results/result_pathways_SGB30_VGdur_training.Rds')
}
result_pathways <- readRDS('gitignore/lm_results/result_pathways_SGB30_VGdur_training.Rds')7.2.5 Show and save results
Open code
kableExtra::kable(result_pathways %>%
dplyr::select(
outcome,
est_ITcountry_avg, P_ITcountry_avg,
est_Age, P_Age,
est_Diet_duration_avg, P_Diet_duration_avg,
est_Diet_duration_inCZ, P_Diet_duration_inCZ,
est_Diet_duration_inIT, P_Diet_duration_inIT,
diet_country_int, P_diet_country_int,
CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT
) %>%
arrange(P_Diet_duration_avg),
caption = "Results of linear models, modelling clr-transformed pathways (functional pathways inferred from shotgun metagenomic sequencing, expressed as relative proportions) with vegan status duration (`Diet_duration`), `Country`, their interaction (`Diet_duration × Country`), and `Age` as predictors, using training data only (Czech and Italian vegan cohorts). Only pathways with clr-transformed levels differing by diet (FDR < 0.05, average effect across both countries) are shown. `est` prefix: denotes estimated effects (regression coefficients), i.e. expected change in clr-transforemd pathway proportion per 10 years of vegan diet or age, or for country (Italy vs Czech Republic). `P`: p-value. `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. `avg` suffix: effect averaged across cohorts, whereas `inCZ` and `inIT` indicate effects in the Czech or Italian cohort, respectively. All estimates in a single row are based on a single model with interaction",
escape = FALSE
)| outcome | est_ITcountry_avg | P_ITcountry_avg | est_Age | P_Age | est_Diet_duration_avg | P_Diet_duration_avg | est_Diet_duration_inCZ | P_Diet_duration_inCZ | est_Diet_duration_inIT | P_Diet_duration_inIT | diet_country_int | P_diet_country_int | CI_L_Diet_duration_avg | CI_U_Diet_duration_avg | CI_L_Diet_duration_inCZ | CI_U_Diet_duration_inCZ | CI_L_Diet_duration_inIT | CI_U_Diet_duration_inIT |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PWY0_42 | -0.2773973 | 0 | -0.1004082 | 0.3386596 | 0.8437349 | 0.0175709 | 0.1036314 | 0.7679825 | 1.5838384 | 0.0057482 | 1.4802070 | 0.0192055 | 0.1507842 | 1.5366857 | -0.5920523 | 0.7993152 | 0.4716463 | 2.6960305 |
| PWY_5104 | -1.1008129 | 0 | -0.1255575 | 0.3748064 | 0.8819153 | 0.0640597 | -0.3115837 | 0.5111188 | 2.0754142 | 0.0072186 | 2.3869979 | 0.0053976 | -0.0526011 | 1.8164317 | -1.2497857 | 0.6266184 | 0.5755072 | 3.5753213 |
| PWY0_1586 | -0.1532302 | 0 | -0.1244985 | 0.0668966 | 0.2778437 | 0.2187251 | 0.0725126 | 0.7482276 | 0.4831749 | 0.1829736 | 0.4106623 | 0.3064622 | -0.1677868 | 0.7234743 | -0.3748755 | 0.5199007 | -0.2320661 | 1.1984158 |
| PWY_5747 | -0.5426096 | 0 | 0.1577512 | 0.3724702 | 0.6957188 | 0.2399696 | -0.1858384 | 0.7537062 | 1.5772759 | 0.0982099 | 1.7631142 | 0.0955870 | -0.4726613 | 1.8640988 | -1.3588264 | 0.9871497 | -0.2979845 | 3.4525362 |
| PWY_5367 | 0.0661884 | 0 | -0.0863856 | 0.4895611 | 0.4896242 | 0.2423707 | 0.1337625 | 0.7495402 | 0.8454859 | 0.2087336 | 0.7117233 | 0.3390805 | -0.3368826 | 1.3161310 | -0.6960039 | 0.9635290 | -0.4810648 | 2.1720365 |
| TRPSYN_PWY | -0.4380714 | 0 | -0.0620052 | 0.3493535 | 0.2170546 | 0.3270580 | 0.0714053 | 0.7475225 | 0.3627038 | 0.3076609 | 0.2912985 | 0.4594053 | -0.2205018 | 0.6546109 | -0.3678767 | 0.5106874 | -0.3395780 | 1.0649857 |
| PWY_7237 | -0.4726868 | 0 | -0.1277910 | 0.1446405 | 0.2753235 | 0.3453469 | 0.0708106 | 0.8085468 | 0.4798364 | 0.3057300 | 0.4090259 | 0.4305587 | -0.3012190 | 0.8518660 | -0.5080058 | 0.6496269 | -0.4455194 | 1.4051922 |
| PWY_6284 | 0.1978288 | 0 | -0.0630640 | 0.6001771 | 0.3609705 | 0.3700236 | 0.1088588 | 0.7872983 | 0.6130822 | 0.3429492 | 0.5042234 | 0.4813548 | -0.4349332 | 1.1568742 | -0.6901838 | 0.9079015 | -0.6643502 | 1.8905146 |
| PWY_6629 | -0.2677474 | 0 | -0.0508378 | 0.3895271 | 0.1611869 | 0.4141907 | 0.0707759 | 0.7205976 | 0.2515979 | 0.4270884 | 0.1808220 | 0.6063980 | -0.2291330 | 0.5515069 | -0.3210834 | 0.4626353 | -0.3748691 | 0.8780649 |
| PWY_1269 | 0.0542379 | 0 | -0.0743909 | 0.4739454 | 0.2771687 | 0.4248289 | -0.0012748 | 0.9970775 | 0.5556122 | 0.3193538 | 0.5568870 | 0.3678700 | -0.4095946 | 0.9639319 | -0.6907467 | 0.6881970 | -0.5466489 | 1.6578733 |
| GALACTITOLCAT_PWY | -0.7587257 | 0 | -0.1317090 | 0.4444355 | 0.4407269 | 0.4438474 | -0.0974639 | 0.8658540 | 0.9789177 | 0.2900121 | 1.0763816 | 0.2940863 | -0.6976062 | 1.5790601 | -1.2402865 | 1.0453588 | -0.8481171 | 2.8059525 |
| HEXITOLDEGSUPER_PWY | -0.6644502 | 0 | -0.0964017 | 0.5397155 | 0.3674010 | 0.4844670 | -0.0815175 | 0.8770494 | 0.8163194 | 0.3337190 | 0.8978369 | 0.3376733 | -0.6721866 | 1.4069885 | -1.1252051 | 0.9621701 | -0.8522278 | 2.4848666 |
| NONOXIPENT_PWY | -0.1145726 | 0 | -0.0847598 | 0.2142740 | 0.1431280 | 0.5290129 | 0.0976397 | 0.6686444 | 0.1886162 | 0.6051177 | 0.0909765 | 0.8219570 | -0.3067688 | 0.5930247 | -0.3540314 | 0.5493108 | -0.5334720 | 0.9107045 |
| PROPFERM_PWY | -0.6354697 | 0 | 0.0042961 | 0.9746762 | 0.2765237 | 0.5413465 | 0.1743986 | 0.7010282 | 0.3786488 | 0.6022046 | 0.2042502 | 0.7997380 | -0.6193986 | 1.1724460 | -0.7250571 | 1.0738544 | -1.0593144 | 1.8166120 |
| PWY_8188 | -0.6354697 | 0 | 0.0042961 | 0.9746762 | 0.2765237 | 0.5413465 | 0.1743986 | 0.7010282 | 0.3786488 | 0.6022046 | 0.2042502 | 0.7997380 | -0.6193986 | 1.1724460 | -0.7250571 | 1.0738544 | -1.0593144 | 1.8166120 |
| PWY_8189 | -0.6354697 | 0 | 0.0042961 | 0.9746762 | 0.2765237 | 0.5413465 | 0.1743986 | 0.7010282 | 0.3786488 | 0.6022046 | 0.2042502 | 0.7997380 | -0.6193986 | 1.1724460 | -0.7250571 | 1.0738544 | -1.0593144 | 1.8166120 |
| PWY_6293 | -0.6016035 | 0 | 0.1982998 | 0.1229276 | -0.1430936 | 0.7374860 | -0.6189605 | 0.1508899 | 0.3327732 | 0.6273239 | 0.9517337 | 0.2121950 | -0.9885097 | 0.7023224 | -1.4677108 | 0.2297898 | -1.0241270 | 1.6896734 |
| PWY_8178 | -0.0496325 | 0 | -0.0790724 | 0.2486419 | 0.0534566 | 0.8148275 | 0.0533209 | 0.8160012 | 0.0535923 | 0.8836815 | 0.0002714 | 0.9994669 | -0.3986201 | 0.5055333 | -0.4005387 | 0.5071806 | -0.6719948 | 0.7791794 |
| PWY_801 | -0.6715556 | 0 | 0.2036234 | 0.1236193 | -0.0705987 | 0.8722625 | -0.6203486 | 0.1615842 | 0.4791513 | 0.4970850 | 1.0994999 | 0.1616967 | -0.9403352 | 0.7991378 | -1.4935153 | 0.2528181 | -0.9167835 | 1.8750860 |
| CALVIN_PWY | -0.0743894 | 0 | -0.0533955 | 0.3990599 | 0.0301266 | 0.8865682 | 0.0180314 | 0.9322252 | 0.0422218 | 0.9008715 | 0.0241904 | 0.9486865 | -0.3882129 | 0.4484661 | -0.4019580 | 0.4380208 | -0.6292167 | 0.7136603 |
| HSERMETANA_PWY | 0.0093006 | 0 | 0.0204975 | 0.7260141 | -0.0029857 | 0.9878091 | 0.0471015 | 0.8102860 | -0.0530730 | 0.8656333 | -0.1001744 | 0.7733788 | -0.3900740 | 0.3841025 | -0.3415134 | 0.4357163 | -0.6743529 | 0.5682070 |
| PWY_5971 | 0.7845608 | 0 | -0.0940461 | 0.5469506 | 0.0002204 | 0.9996626 | -0.0740198 | 0.8875266 | 0.0744606 | 0.9290834 | 0.1484804 | 0.8728599 | -1.0323683 | 1.0328091 | -1.1106809 | 0.9626414 | -1.5828533 | 1.7317746 |
Open code
if(file.exists('gitignore/lm_results/result_pathways_VGdur_training.csv') == FALSE){
write.table(result_pathways,
'gitignore/lm_results/result_pathways_VGdur_training.csv', row.names = FALSE)
}7.3 Validation cohort
Open code
data_analysis_pathways <- data_pathways_valid2 %>%
filter(Diet == 'VEGAN',
!is.na(Diet_duration)) %>%
dplyr::mutate(Diet_duration = as.numeric(as.character(Diet_duration))) %>%
dplyr::select(
Diet_duration, Age,
all_of(diet_sensitive_pathways$outcome)
)
summary(data_analysis_pathways[ , 1:12])
## Diet_duration Age CALVIN_PWY GALACTITOLCAT_PWY
## Min. : 0.170 Min. :21.54 Min. :2.242 Min. :-5.686
## 1st Qu.: 4.062 1st Qu.:30.05 1st Qu.:2.720 1st Qu.:-5.191
## Median : 6.000 Median :32.81 Median :2.883 Median :-4.437
## Mean : 6.790 Mean :32.37 Mean :2.865 Mean :-3.758
## 3rd Qu.:10.000 3rd Qu.:34.24 3rd Qu.:3.052 3rd Qu.:-2.130
## Max. :15.000 Max. :44.08 Max. :3.372 Max. : 1.556
## HEXITOLDEGSUPER_PWY HSERMETANA_PWY NONOXIPENT_PWY PROPFERM_PWY
## Min. :-4.440 Min. :1.852 Min. :1.932 Min. :-3.650
## 1st Qu.:-3.945 1st Qu.:2.325 1st Qu.:2.398 1st Qu.:-3.072
## Median :-3.199 Median :2.451 Median :2.629 Median :-2.934
## Mean :-2.646 Mean :2.443 Mean :2.610 Mean :-2.836
## 3rd Qu.:-1.018 3rd Qu.:2.602 3rd Qu.:2.832 3rd Qu.:-2.763
## Max. : 0.303 Max. :3.075 Max. :3.384 Max. :-1.320
## PWY_1269 PWY_5104 PWY_5367 PWY_5747
## Min. :-1.5780 Min. :-4.596 Min. :-5.0226 Min. :-6.481
## 1st Qu.: 0.3895 1st Qu.:-4.332 1st Qu.:-4.3303 1st Qu.:-6.231
## Median : 0.7839 Median :-4.173 Median :-2.6540 Median :-6.127
## Mean : 0.8536 Mean :-3.878 Mean :-2.6949 Mean :-5.946
## 3rd Qu.: 1.2113 3rd Qu.:-3.994 3rd Qu.:-1.8717 3rd Qu.:-6.027
## Max. : 2.3199 Max. :-1.523 Max. : 0.7546 Max. :-2.9817.3.0.1 Define number of pathways and covariates
Open code
n_covarites <- 2
n_features <- ncol(data_analysis_pathways) - n_covarites7.3.0.2 Create empty objects
Open code
outcome <- vector('double', n_features)
est_VGduration <- vector('double', n_features)
P_VGduration <- vector('double', n_features)
est_Age <- vector('double', n_features)
P_Age <- vector('double', n_features)
CI_L_VGduration <- vector('double', n_features)
CI_U_VGduration <- vector('double', n_features)7.3.0.3 Estimate over outcomes
Open code
for (i in 1:n_features) {
## define variable
data_analysis_pathways$outcome <- data_analysis_pathways[, (i + n_covarites)]
## fit model
model <- lm(outcome ~ Diet_duration + Age,
data = data_analysis_pathways %>%
mutate(Diet_duration = (Diet_duration)/10,
Age = (Age-33)/10))
## save results
outcome[i] <- names(data_analysis_pathways)[i + n_covarites]
## diet effect
tr <- confint(model)
CI_L_VGduration[i] <- tr[which(row.names(tr) == "Diet_duration"), ][1]
CI_U_VGduration[i] <- tr[which(row.names(tr) == "Diet_duration"), ][2]
est_VGduration[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_duration"
), 1
]
P_VGduration[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_duration"
), 4
]
est_Age[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Age"
), 1
]
P_Age[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Age"
), 4
]
}7.3.0.4 Results table
Open code
result_pathways_val <- data.frame(
outcome,
est_VGduration, P_VGduration,
CI_L_VGduration, CI_U_VGduration,
est_Age, P_Age
)
kableExtra::kable(result_pathways_val %>%
arrange(P_VGduration),
caption = "Results of linear models estimating the effect of vegan diet status duration and age on clr-transformed pathway proportions (functional pathways inferred from shotgun metagenomic sequencing). Only pathways with significant differences between vegans and omnivores in training cohorts (FDR < 0.05, average effect over both countries) were included. `est`: regression coefficient, i.e. expected change in clr-transformed pathway proportion per +10 years of vegan diet and +10 years of age respectively. `P`: p-value; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. All estimates in a single row are based on a single model"
)| outcome | est_VGduration | P_VGduration | CI_L_VGduration | CI_U_VGduration | est_Age | P_Age |
|---|---|---|---|---|---|---|
| PWY0_1586 | -0.2034972 | 0.1217437 | -0.4631027 | 0.0561082 | -0.1596314 | 0.1513086 |
| PWY_5971 | 0.4404654 | 0.1852968 | -0.2180837 | 1.0990145 | 0.3086420 | 0.2720620 |
| PWY_8188 | -0.2530201 | 0.2466083 | -0.6864001 | 0.1803598 | -0.0053484 | 0.9767893 |
| PROPFERM_PWY | -0.2530201 | 0.2466083 | -0.6864001 | 0.1803598 | -0.0053484 | 0.9767893 |
| PWY_8189 | -0.2530201 | 0.2466083 | -0.6864001 | 0.1803598 | -0.0053484 | 0.9767893 |
| PWY_6284 | 0.4282847 | 0.4363589 | -0.6676821 | 1.5242515 | 0.3688943 | 0.4288990 |
| PWY_5367 | 0.5092329 | 0.4401775 | -0.8048535 | 1.8233193 | 0.4477825 | 0.4232473 |
| PWY_5104 | 0.2435981 | 0.4856992 | -0.4527699 | 0.9399660 | 0.1381028 | 0.6404712 |
| PWY_5747 | -0.2012648 | 0.5319405 | -0.8433037 | 0.4407740 | 0.7033059 | 0.0123152 |
| NONOXIPENT_PWY | -0.0511207 | 0.6947732 | -0.3111956 | 0.2089541 | -0.3182347 | 0.0055092 |
| TRPSYN_PWY | -0.0291099 | 0.8042250 | -0.2636445 | 0.2054247 | -0.2272285 | 0.0258614 |
| PWY_1269 | 0.0913651 | 0.8077977 | -0.6587243 | 0.8414546 | -0.5837282 | 0.0710437 |
| PWY_8178 | -0.0294339 | 0.8122229 | -0.2768912 | 0.2180233 | -0.3027070 | 0.0055220 |
| HSERMETANA_PWY | -0.0245491 | 0.8165366 | -0.2358853 | 0.1867871 | -0.2385405 | 0.0100427 |
| PWY_801 | 0.0487496 | 0.8200686 | -0.3793120 | 0.4768112 | 0.2074197 | 0.2563364 |
| PWY_6293 | 0.0484819 | 0.8204130 | -0.3780595 | 0.4750232 | 0.2063192 | 0.2571641 |
| PWY_7237 | 0.0314140 | 0.8652072 | -0.3382422 | 0.4010702 | -0.4885971 | 0.0028790 |
| PWY0_42 | 0.0433593 | 0.8785533 | -0.5234589 | 0.6101774 | 0.3927582 | 0.1068248 |
| CALVIN_PWY | -0.0101717 | 0.9288856 | -0.2378529 | 0.2175096 | -0.2283578 | 0.0212871 |
| PWY_6629 | 0.0075671 | 0.9437346 | -0.2066243 | 0.2217585 | -0.1913790 | 0.0391760 |
| HEXITOLDEGSUPER_PWY | -0.0226809 | 0.9746535 | -1.4487920 | 1.4034301 | 0.1999101 | 0.7411747 |
| GALACTITOLCAT_PWY | 0.0239274 | 0.9768329 | -1.6221390 | 1.6699939 | 0.2729324 | 0.6960884 |
Open code
if (file.exists("gitignore/result_pathways_VGdur_valid.csv") == FALSE) {
write.table(result_pathways_val,
"gitignore/result_pathways_VGdur_valid.csv",
row.names = FALSE
)
}7.4 Forest plot
7.4.1 Prepare data
Open code
## subset result tables
result_pathways_subset <- result_pathways %>%
filter(outcome %in% diet_sensitive_pathways$outcome)
result_pathways_val_subset <- result_pathways_val %>%
filter(outcome %in% diet_sensitive_pathways$outcome)
## create a data frame
data_forest <- data.frame(
outcome = rep(diet_sensitive_pathways$outcome, 3),
beta = c(
result_pathways_subset$est_Diet_duration_inCZ,
result_pathways_subset$est_Diet_duration_inIT,
result_pathways_val_subset$est_VGduration
),
lower = c(
result_pathways_subset$CI_L_Diet_duration_inCZ,
result_pathways_subset$CI_L_Diet_duration_inIT,
result_pathways_val_subset$CI_L_VGduration
),
upper = c(
result_pathways_subset$CI_U_Diet_duration_inCZ,
result_pathways_subset$CI_U_Diet_duration_inIT,
result_pathways_val_subset$CI_U_VGduration
),
dataset = c(
rep("CZ", len),
rep("IT", len),
rep("Validation", len)
)
)
data_forest <- data_forest %>%
mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
left_join(
val_res %>% select(outcome, pathway),
by = 'outcome') %>%
left_join(
elacoef %>% mutate(outcome = pathway) %>% select(-pathway),
by = 'outcome') %>%
mutate(outcome = factor(pathway, levels = validation_order))7.4.1.1 Plotting
Open code
plotac <- "forest_plot_pathways_VGduration"
path <- "gitignore/figures"
colors <- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")
assign(plotac, ggplot(
data_forest, aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
) +
geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
scale_color_manual(values = colors) +
labs(
y = "Effect of vegan diet duration (+10y) on CLR(relative level)",
x = "Outcome",
color = "Dataset"
) +
theme_minimal() +
coord_flip() +
scale_x_discrete(
labels = setNames(
ifelse(data_forest$in_winner,
paste0("**", data_forest$outcome, "**"),
as.character(data_forest$outcome)
), data_forest$outcome
)
) +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = ggtext::element_markdown(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)Diet_duration, Country, their interaction (Diet_duration:Country), and Age as fixed-effect predictors. Age was included as it correlates with Diet_duration and could act as a confounder. In the Czech validation cohort, Diet_duration and Age were fixed-effect predictors. Diet-sensitive pathways are shown in bold, in the same order as in the forest plot of vegan–omnivore differencesOpen code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 9,
height = 13
)
}8 Summary info about Patways
8.1 Proportions of unmapped and unintegrated
8.1.1 Czech trainng cohort
Open code
StratfiedPaths_originalCZ <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_CZ_humann.tsv",
header = TRUE, sep = "\t"
) %>%
filter(!grepl("\\|", X..Pathway))
dim(StratfiedPaths_originalCZ)
## [1] 580 89
all_paths <- colSums(StratfiedPaths_originalCZ[, -1])
unmapped <- StratfiedPaths_originalCZ[1, -1] / all_paths
unintegrated <- StratfiedPaths_originalCZ[2, -1] / all_paths
classified <- 1 - unintegrated - unmapped
kableExtra::kable(
dplyr::bind_rows(
quantile(as.numeric(unmapped), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(unintegrated), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(classified), probs = c(0, 0.25, 0.5, 0.75, 1))
) %>%
dplyr::mutate(
category = c("unmapped", "unintegrated", "classified")
) %>%
dplyr::select(category, everything()),
caption = "Summary statistics (minimum, 25th, 50th, 75th, and maximum) of the proportions of unmapped, unintegrated, and classified pathway reads across samples from the Czech training cohort. 'Unmapped': reads not aligned to UniRef90. 'Unintegrated': reads aligned to UniRef90 gene families but not assigned to any MetaCyc pathway. 'Classified': reads assigned to known MetaCyc pathways (unstratified community-level abundances)"
)| category | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unmapped | 0.1463292 | 0.1816258 | 0.2156915 | 0.2538069 | 0.3634129 |
| unintegrated | 0.5993433 | 0.7027560 | 0.7373852 | 0.7718285 | 0.8265473 |
| classified | 0.0271235 | 0.0416706 | 0.0438855 | 0.0460305 | 0.0594788 |
8.1.2 Italian training cohort
Open code
StratifiedPaths_originalIT <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_IT_humann.tsv",
header = TRUE, sep = "\t"
) %>%
filter(!grepl("\\|", X..Pathway))
dim(StratifiedPaths_originalIT)
## [1] 488 79
all_paths <- colSums(StratifiedPaths_originalIT[, -1])
unmapped <- StratifiedPaths_originalIT[1, -1] / all_paths
unintegrated <- StratifiedPaths_originalIT[2, -1] / all_paths
classified <- 1 - unintegrated - unmapped
kableExtra::kable(
dplyr::bind_rows(
quantile(as.numeric(unmapped), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(unintegrated), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(classified), probs = c(0, 0.25, 0.5, 0.75, 1))
) %>%
dplyr::mutate(
category = c("unmapped", "unintegrated", "classified")
) %>%
dplyr::select(category, everything()),
caption = "Summary statistics (minimum, 25th, 50th, 75th, and maximum) of the proportions of unmapped, unintegrated, and classified pathway reads across samples from the Italian training cohort. 'Unmapped': reads not aligned to UniRef90. 'Unintegrated': reads aligned to UniRef90 gene families but not assigned to any MetaCyc pathway. 'Classified': reads assigned to known MetaCyc pathways (unstratified community-level abundances)"
)| category | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unmapped | 0.1291023 | 0.1754808 | 0.2275788 | 0.2617836 | 0.4390892 |
| unintegrated | 0.5217834 | 0.6929692 | 0.7215643 | 0.7704262 | 0.8291034 |
| classified | 0.0288690 | 0.0432038 | 0.0469259 | 0.0526642 | 0.0798983 |
8.1.3 Czech validation cohort
Open code
StratifiedPaths_validation <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv",
header = TRUE, sep = "\t"
) %>%
filter(!grepl("\\|", X..Pathway))
dim(StratifiedPaths_validation)
## [1] 580 104
all_paths <- colSums(StratifiedPaths_validation[, -1])
unmapped <- StratifiedPaths_validation[1, -1] / all_paths
unintegrated <- StratifiedPaths_validation[2, -1] / all_paths
classified <- 1 - unintegrated - unmapped
kableExtra::kable(
dplyr::bind_rows(
quantile(as.numeric(unmapped), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(unintegrated), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(classified), probs = c(0, 0.25, 0.5, 0.75, 1))
) %>%
dplyr::mutate(
category = c("unmapped", "unintegrated", "classified")
) %>%
dplyr::select(category, everything()),
caption = "Summary statistics (minimum, 25th, 50th, 75th, and maximum) of the proportions of unmapped, unintegrated, and classified pathway reads across samples from the Czech validation cohort. 'Unmapped': reads not aligned to UniRef90. 'Unintegrated': reads aligned to UniRef90 gene families but not assigned to any MetaCyc pathway. 'Classified': reads assigned to known MetaCyc pathways (unstratified community-level abundances)"
)| category | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unmapped | 0.1363081 | 0.1825077 | 0.2098586 | 0.2292549 | 0.3303064 |
| unintegrated | 0.6290565 | 0.7209362 | 0.7419081 | 0.7664460 | 0.8230822 |
| classified | 0.0369937 | 0.0444025 | 0.0488882 | 0.0517536 | 0.0616683 |
Open code
data_path_validation <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv",
header = TRUE, sep = "\t"
) %>%
filter(!grepl("\\|", X..Pathway)) %>%
select(X..Pathway)
dim(data_path_validation)
## [1] 580 18.2 Taxon-assigned vs. taxon-unassigned pathways
This refers to the proportion of reads that were functionally assigned to a MetaCyc pathway but whose taxonomic origin could not be determined, from the all classified reads.
8.2.1 Czech trainng cohort
Open code
Paths_originalCZ <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_CZ_humann.tsv",
header = TRUE, sep = "\t", check.names = FALSE
) %>%
filter(
!grepl("UNMAPPED", `# Pathway`),
!grepl("UNINTEGRATED", `# Pathway`)
)
name_col <- names(Paths_originalCZ)[1]
unstrat <- Paths_originalCZ %>% filter(!grepl("\\|", `# Pathway`))
dim(unstrat)
## [1] 578 89
unstrat[1:10, 1:3]
## # Pathway
## 1 12DICHLORETHDEG-PWY: 1,2-dichloroethane degradation
## 2 1CMET2-PWY: folate transformations III (E. coli)
## 3 3-HYDROXYPHENYLACETATE-DEGRADATION-PWY: 4-hydroxyphenylacetate degradation
## 4 4-HYDROXYMANDELATE-DEGRADATION-PWY: 4-hydroxymandelate degradation
## 5 AEROBACTINSYN-PWY: aerobactin biosynthesis
## 6 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast
## 7 ANAEROFRUCAT-PWY: homolactic fermentation
## 8 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)
## 9 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis
## 10 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation
## T36 T47
## 1 0.000 0.00000
## 2 7206.790 4995.96065
## 3 0.000 0.00000
## 4 0.000 0.00000
## 5 0.000 0.00000
## 6 0.000 22.69929
## 7 3694.568 2757.45106
## 8 5606.681 6779.42061
## 9 1129.840 467.06740
## 10 0.000 0.00000
strat <- Paths_originalCZ %>% filter(grepl("\\|", `# Pathway`))
stratCZ <- strat
dim(strat)
## [1] 20794 89
strat[1:10, 1:3]
## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|g__Akkermansia.s__Akkermansia_muciniphila
## 2 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_finegoldii
## 3 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_onderdonkii
## 4 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_putredinis
## 5 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_sp_CAG_268
## 6 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_timonensis
## 7 1CMET2-PWY: folate transformations III (E. coli)|g__Anaerostipes.s__Anaerostipes_hadrus
## 8 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_caccae
## 9 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_cellulosilyticus
## 10 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_clarus
## T36 T47
## 1 1365.836 0.000000
## 2 0.000 0.000000
## 3 0.000 0.000000
## 4 0.000 0.000000
## 5 0.000 0.000000
## 6 0.000 0.000000
## 7 0.000 13.076579
## 8 0.000 5.707278
## 9 0.000 0.000000
## 10 0.000 0.000000
strat_unclassified <- strat %>%
filter(grepl("\\|unclassified", `# Pathway`))
dim(strat_unclassified)
## [1] 425 89
strat_unclassified[1:10, 1:3]
## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|unclassified
## 2 AEROBACTINSYN-PWY: aerobactin biosynthesis|unclassified
## 3 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast|unclassified
## 4 ANAEROFRUCAT-PWY: homolactic fermentation|unclassified
## 5 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)|unclassified
## 6 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis|unclassified
## 7 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation|unclassified
## 8 ARGININE-SYN4-PWY: L-ornithine biosynthesis II|unclassified
## 9 ARGSYN-PWY: L-arginine biosynthesis I (via L-ornithine)|unclassified
## 10 ARGSYNBSUB-PWY: L-arginine biosynthesis II (acetyl cycle)|unclassified
## T36 T47
## 1 850.22127 735.377758
## 2 0.00000 0.000000
## 3 0.00000 0.000000
## 4 1236.08342 69.609446
## 5 1582.32882 1230.262897
## 6 53.57712 60.195263
## 7 0.00000 0.000000
## 8 17.32423 7.024764
## 9 1881.25353 1426.920220
## 10 1578.50191 1479.972027
quants <- quantile(
colSums(strat_unclassified[, -1]) / colSums(unstrat[, -1]),
probs = c(0, 0.25, 0.5, 0.75, 1)
)
quants_df <- as.data.frame(t(quants))
names(quants_df) <- c("0%", "25%", "50%", "75%", "100%")
quants_df$category <- "unclassified_within_taxon-assigned"
kableExtra::kable(
quants_df %>% select(category, everything()),
caption = "Summary statistics (minimum, 25th, 50th, 75th, maximum) of the proportion of pathway signal that was taxonomically un-assigned, relative to all classified pathways (Czech training cohort). 'Unclassified pathways': MetaCyc pathway abundances that HUMAnN could not attribute to specific taxa (rows labeled '|unclassified' in the stratified output)"
)| category | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified_within_taxon-assigned | 0.024712 | 0.0665779 | 0.0909481 | 0.1279512 | 0.3327366 |
8.2.2 Italian trainng cohort
Open code
Paths_originalIT <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_IT_humann.tsv",
header = TRUE, sep = "\t", check.names = FALSE
) %>%
filter(
!grepl("UNMAPPED", `# Pathway`),
!grepl("UNINTEGRATED", `# Pathway`)
)
name_col <- names(Paths_originalCZ)[1]
unstrat <- Paths_originalIT %>% filter(!grepl("\\|", `# Pathway`))
dim(unstrat)
## [1] 486 79
unstrat[1:10, 1:3]
## # Pathway
## 1 12DICHLORETHDEG-PWY: 1,2-dichloroethane degradation
## 2 1CMET2-PWY: folate transformations III (E. coli)
## 3 3-HYDROXYPHENYLACETATE-DEGRADATION-PWY: 4-hydroxyphenylacetate degradation
## 4 AEROBACTINSYN-PWY: aerobactin biosynthesis
## 5 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast
## 6 ANAEROFRUCAT-PWY: homolactic fermentation
## 7 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)
## 8 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis
## 9 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation
## 10 ARGININE-SYN4-PWY: L-ornithine biosynthesis II
## VOV002 VOV003
## 1 0.00000 0.00000
## 2 5831.38667 15090.47182
## 3 10.16048 0.00000
## 4 0.00000 27.10151
## 5 0.00000 375.52244
## 6 2392.63624 6594.16155
## 7 5156.72228 14176.11481
## 8 1649.41125 3106.04621
## 9 73.37596 0.00000
## 10 3006.12980 5661.73935
strat <- Paths_originalIT %>% filter(grepl("\\|", `# Pathway`))
stratIT <- strat
dim(strat)
## [1] 18706 79
strat[1:10, 1:3]
## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|g__Akkermansia.s__Akkermansia_muciniphila
## 2 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_finegoldii
## 3 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_onderdonkii
## 4 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_putredinis
## 5 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_sp_An66
## 6 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_sp_CAG_268
## 7 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_timonensis
## 8 1CMET2-PWY: folate transformations III (E. coli)|g__Anaerostipes.s__Anaerostipes_hadrus
## 9 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_caccae
## 10 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_cellulosilyticus
## VOV002 VOV003
## 1 0.00000 268.83988
## 2 49.94327 322.46788
## 3 16.31622 0.00000
## 4 389.90038 623.18145
## 5 0.00000 0.00000
## 6 843.30879 0.00000
## 7 0.00000 0.00000
## 8 0.00000 0.00000
## 9 36.86016 91.25667
## 10 0.00000 0.00000
strat_unclassified <- strat %>%
filter(grepl("\\|unclassified", `# Pathway`))
dim(strat_unclassified)
## [1] 440 79
strat_unclassified[1:10, 1:3]
## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|unclassified
## 2 3-HYDROXYPHENYLACETATE-DEGRADATION-PWY: 4-hydroxyphenylacetate degradation|unclassified
## 3 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast|unclassified
## 4 ANAEROFRUCAT-PWY: homolactic fermentation|unclassified
## 5 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)|unclassified
## 6 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis|unclassified
## 7 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation|unclassified
## 8 ARGININE-SYN4-PWY: L-ornithine biosynthesis II|unclassified
## 9 ARGSYN-PWY: L-arginine biosynthesis I (via L-ornithine)|unclassified
## 10 ARGSYNBSUB-PWY: L-arginine biosynthesis II (acetyl cycle)|unclassified
## VOV002 VOV003
## 1 592.735952 1831.3830
## 2 8.815428 0.0000
## 3 0.000000 0.0000
## 4 211.424264 819.0017
## 5 453.479555 1874.8780
## 6 65.875697 438.1747
## 7 20.241094 0.0000
## 8 192.409181 200.4070
## 9 603.938329 4149.4989
## 10 636.519706 4733.9368
quants <- quantile(
colSums(strat_unclassified[, -1]) / colSums(unstrat[, -1]),
probs = c(0, 0.25, 0.5, 0.75, 1)
)
quants_df <- as.data.frame(t(quants))
names(quants_df) <- c("0%", "25%", "50%", "75%", "100%")
quants_df$category <- "unclassified_within_taxon-assigned"
kableExtra::kable(
quants_df %>% select(category, everything()),
caption = "Summary statistics (minimum, 25th, 50th, 75th, maximum) of the proportion of pathway signal that was taxonomically un-assigned, relative to all classified pathways (Italian training cohort). 'Unclassified pathways': MetaCyc pathway abundances that HUMAnN could not attribute to specific taxa (rows labeled '|unclassified' in the stratified output)"
)| category | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified_within_taxon-assigned | 0.0357561 | 0.0851309 | 0.1141805 | 0.1542638 | 0.4046326 |
8.2.3 Validation cohort
Open code
Paths_validation <- read.delim(
"gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv",
header = TRUE, sep = "\t", check.names = FALSE
) %>%
filter(
!grepl("UNMAPPED", `# Pathway`),
!grepl("UNINTEGRATED", `# Pathway`)
)
unstrat <- Paths_validation %>% filter(!grepl("\\|", `# Pathway`))
dim(unstrat)
## [1] 578 104
unstrat[1:10, 1:3]
## # Pathway
## 1 12DICHLORETHDEG-PWY: 1,2-dichloroethane degradation
## 2 1CMET2-PWY: folate transformations III (E. coli)
## 3 3-HYDROXYPHENYLACETATE-DEGRADATION-PWY: 4-hydroxyphenylacetate degradation
## 4 4-HYDROXYMANDELATE-DEGRADATION-PWY: 4-hydroxymandelate degradation
## 5 AEROBACTINSYN-PWY: aerobactin biosynthesis
## 6 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast
## 7 ANAEROFRUCAT-PWY: homolactic fermentation
## 8 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)
## 9 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis
## 10 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation
## 4 5
## 1 0.0000 0.0000
## 2 6827.5671 6081.1138
## 3 0.0000 0.0000
## 4 0.0000 0.0000
## 5 0.0000 0.0000
## 6 0.0000 0.0000
## 7 2021.6986 4361.0989
## 8 8939.6788 8361.6527
## 9 796.0169 316.3769
## 10 0.0000 0.0000
strat <- Paths_validation %>% filter(grepl("\\|", `# Pathway`))
stratVAL <- strat
dim(strat)
## [1] 20794 104
strat[1:10, 1:3]
## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|g__Akkermansia.s__Akkermansia_muciniphila
## 2 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_finegoldii
## 3 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_onderdonkii
## 4 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_putredinis
## 5 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_sp_CAG_268
## 6 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_timonensis
## 7 1CMET2-PWY: folate transformations III (E. coli)|g__Anaerostipes.s__Anaerostipes_hadrus
## 8 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_caccae
## 9 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_cellulosilyticus
## 10 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_clarus
## 4 5
## 1 30.77154 0.00000
## 2 0.00000 0.00000
## 3 0.00000 0.00000
## 4 73.75813 204.65122
## 5 0.00000 0.00000
## 6 0.00000 0.00000
## 7 54.62160 37.22825
## 8 0.00000 0.00000
## 9 0.00000 0.00000
## 10 0.00000 0.00000
strat_unclassified <- strat %>%
filter(grepl("\\|unclassified", `# Pathway`))
dim(strat_unclassified)
## [1] 425 104
strat_unclassified[1:10, 1:3]
## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|unclassified
## 2 AEROBACTINSYN-PWY: aerobactin biosynthesis|unclassified
## 3 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast|unclassified
## 4 ANAEROFRUCAT-PWY: homolactic fermentation|unclassified
## 5 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)|unclassified
## 6 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis|unclassified
## 7 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation|unclassified
## 8 ARGININE-SYN4-PWY: L-ornithine biosynthesis II|unclassified
## 9 ARGSYN-PWY: L-arginine biosynthesis I (via L-ornithine)|unclassified
## 10 ARGSYNBSUB-PWY: L-arginine biosynthesis II (acetyl cycle)|unclassified
## 4 5
## 1 822.03841 701.9697
## 2 0.00000 0.0000
## 3 0.00000 0.0000
## 4 179.78691 259.8946
## 5 845.21715 793.1012
## 6 0.00000 0.0000
## 7 0.00000 0.0000
## 8 89.80743 0.0000
## 9 1308.48866 1227.9350
## 10 1658.82435 1324.9933
quants <- quantile(
colSums(strat_unclassified[, -1]) / colSums(unstrat[, -1]),
probs = c(0, 0.25, 0.5, 0.75, 1)
)
quants_df <- as.data.frame(t(quants))
names(quants_df) <- c("0%", "25%", "50%", "75%", "100%")
quants_df$category <- "unclassified_within_taxon-assigned"
kableExtra::kable(
quants_df %>% select(category, everything()),
caption = "Summary statistics (minimum, 25th, 50th, 75th, maximum) of the proportion of pathway signal that was taxonomically un-assigned, relative to all classified pathways (Czech validation cohort). 'Unclassified pathways': MetaCyc pathway abundances that HUMAnN could not attribute to specific taxa (rows labeled '|unclassified' in the stratified output)"
)| category | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified_within_taxon-assigned | 0.0259304 | 0.0801659 | 0.1083844 | 0.1322093 | 0.2306524 |
8.3 Taxa contribution to diet-sensitive pathways
8.3.1 PWY-5367: petroselinate biosynthesis
8.3.1.1 Czech training cohort
Open code
res <- stratCZ %>%
filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| g__Blautia.s__Ruminococcus_torques | 0 | 0.1836 | 0.9822 | 1.0000 | 1.0000 |
| g__Escherichia.s__Escherichia_coli | 0 | 0.0000 | 0.0000 | 0.0596 | 1.0000 |
| g__Citrobacter.s__Citrobacter_freundii | 0 | 0.0000 | 0.0000 | 0.0000 | 1.0000 |
| unclassified | 0 | 0.0000 | 0.0000 | 0.0000 | 1.0000 |
| g__Enterococcus.s__Enterococcus_faecalis | 0 | 0.0000 | 0.0000 | 0.0000 | 0.3800 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1882 |
| g__Enterobacter.s__Enterobacter_roggenkampii | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1190 |
| g__Streptococcus.s__Streptococcus_pneumoniae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1078 |
| g__Escherichia.s__Escherichia_fergusonii | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0843 |
| g__Escherichia.s__Escherichia_marmotae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0726 |
8.3.1.2 Italian training cohort
Open code
res <- stratIT %>%
filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| g__Escherichia.s__Escherichia_coli | 0 | 0 | 0.6316 | 1 | 1.0000 |
| g__Blautia.s__Ruminococcus_torques | 0 | 0 | 0.0000 | 0 | 1.0000 |
| g__Escherichia.s__Escherichia_marmotae | 0 | 0 | 0.0000 | 0 | 1.0000 |
| g__Leclercia.s__Leclercia_adecarboxylata | 0 | 0 | 0.0000 | 0 | 1.0000 |
| unclassified | 0 | 0 | 0.0000 | 0 | 1.0000 |
| g__Enterobacter.s__Enterobacter_roggenkampii | 0 | 0 | 0.0000 | 0 | 0.9279 |
| g__Enterobacter.s__Enterobacter_mori | 0 | 0 | 0.0000 | 0 | 0.7381 |
| g__Citrobacter.s__Citrobacter_freundii | 0 | 0 | 0.0000 | 0 | 0.7378 |
| g__Citrobacter.s__Citrobacter_braakii | 0 | 0 | 0.0000 | 0 | 0.4592 |
| g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0.0000 | 0 | 0.4285 |
8.3.1.3 Czech validation cohort
Open code
res <- stratVAL %>%
filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| g__Blautia.s__Ruminococcus_torques | 0 | 1 | 1 | 1 | 1.0000 |
| unclassified | 0 | 0 | 0 | 0 | 1.0000 |
| g__Enterococcus.s__Enterococcus_hirae | 0 | 0 | 0 | 0 | 0.8935 |
| g__Enterobacter.s__Enterobacter_bugandensis | 0 | 0 | 0 | 0 | 0.8379 |
| g__Escherichia.s__Escherichia_coli | 0 | 0 | 0 | 0 | 0.6883 |
| g__Enterococcus.s__Enterococcus_faecium | 0 | 0 | 0 | 0 | 0.5535 |
| g__Enterococcus.s__Enterococcus_durans | 0 | 0 | 0 | 0 | 0.4465 |
| g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0 | 0 | 0.3625 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 0.3454 |
| g__Escherichia.s__Escherichia_fergusonii | 0 | 0 | 0 | 0 | 0.3117 |
8.3.2 NONOXIPENT-PWY: pentose phosphate pathway (non-oxidative branch) I
8.3.2.1 Czech training cohort
Open code
res <- stratCZ %>%
filter(grepl("NONOXIPENT-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `NONOXIPENT-PWY: pentose phosphate pathway (non-oxidative branch) I` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0.0535 | 0.1265 | 0.2137 | 0.2827 | 0.6020 |
| g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0125 | 0.0417 | 0.1063 | 0.2775 |
| g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0131 | 0.0404 | 0.0732 | 0.1831 |
| g__Eubacterium.s__Eubacterium_eligens | 0.0000 | 0.0000 | 0.0299 | 0.0727 | 0.4486 |
| g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0123 | 0.0266 | 0.0518 | 0.1936 |
| g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0057 | 0.0208 | 0.0565 | 0.3011 |
| g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0160 | 0.0556 | 0.5527 |
| g__Eubacterium.s__Eubacterium_hallii | 0.0000 | 0.0015 | 0.0136 | 0.0295 | 0.0907 |
| g__Parabacteroides.s__Parabacteroides_distasonis | 0.0000 | 0.0000 | 0.0109 | 0.0300 | 0.2507 |
| g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0000 | 0.0108 | 0.0306 | 0.1724 |
8.3.2.2 Italian training cohort
Open code
res <- stratIT %>%
filter(grepl("NONOXIPENT-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `NONOXIPENT-PWY: pentose phosphate pathway (non-oxidative branch) I` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0.0061 | 0.2064 | 0.3930 | 0.5533 | 0.8199 |
| g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0024 | 0.0196 | 0.0485 | 0.4152 |
| g__Parabacteroides.s__Parabacteroides_distasonis | 0.0000 | 0.0018 | 0.0169 | 0.0475 | 0.1819 |
| g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0020 | 0.0167 | 0.0388 | 0.4557 |
| g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0007 | 0.0159 | 0.0401 | 0.3181 |
| g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0070 | 0.0316 | 0.2220 |
| g__Flavonifractor.s__Flavonifractor_plautii | 0.0000 | 0.0000 | 0.0031 | 0.0200 | 0.0997 |
| g__Agathobaculum.s__Agathobaculum_butyriciproducens | 0.0000 | 0.0000 | 0.0020 | 0.0121 | 0.0645 |
| g__Escherichia.s__Escherichia_coli | 0.0000 | 0.0000 | 0.0000 | 0.0717 | 0.7774 |
| g__Roseburia.s__Roseburia_hominis | 0.0000 | 0.0000 | 0.0000 | 0.0121 | 0.3341 |
8.3.2.3 Czech validation cohort
Open code
res <- stratVAL %>%
filter(grepl("NONOXIPENT-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `NONOXIPENT-PWY: pentose phosphate pathway (non-oxidative branch) I` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0.0353 | 0.1040 | 0.1365 | 0.2026 | 0.4451 |
| g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0403 | 0.0730 | 0.0974 | 0.2555 |
| g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0330 | 0.0591 | 0.1089 | 0.2600 |
| g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0047 | 0.0484 | 0.1113 | 0.3453 |
| g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0196 | 0.0439 | 0.1011 | 0.3910 |
| g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0264 | 0.0424 | 0.0685 | 0.2160 |
| g__Eubacterium.s__Eubacterium_hallii | 0.0000 | 0.0257 | 0.0365 | 0.0512 | 0.2285 |
| g__Blautia.s__Blautia_obeum | 0.0003 | 0.0218 | 0.0343 | 0.0457 | 0.1634 |
| g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0198 | 0.0334 | 0.0540 | 0.2550 |
| g__Blautia.s__Blautia_wexlerae | 0.0000 | 0.0114 | 0.0210 | 0.0359 | 0.1837 |
8.3.3 TRPSYN-PWY: L-tryptophan biosynthesis
8.3.3.1 Czech training cohort
Open code
res <- stratCZ %>%
filter(grepl("TRPSYN-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `TRPSYN-PWY: L-tryptophan biosynthesis` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0.0171 | 0.0970 | 0.1647 | 0.2772 | 0.5479 |
| g__Bacteroides.s__Bacteroides_vulgatus | 0.0000 | 0.0261 | 0.0950 | 0.2094 | 0.7458 |
| g__Eubacterium.s__Eubacterium_eligens | 0.0000 | 0.0000 | 0.0312 | 0.0656 | 0.2780 |
| g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0040 | 0.0247 | 0.0743 | 0.3175 |
| g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0030 | 0.0209 | 0.0405 | 0.1779 |
| g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0072 | 0.0179 | 0.0310 | 0.1868 |
| g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0000 | 0.0130 | 0.0348 | 0.2000 |
| g__Ruminococcus.s__Ruminococcus_bromii | 0.0000 | 0.0000 | 0.0115 | 0.0505 | 0.3164 |
| g__Blautia.s__Blautia_obeum | 0.0000 | 0.0030 | 0.0098 | 0.0234 | 0.2721 |
| g__Firmicutes_unclassified.s__Firmicutes_bacterium_CAG_41 | 0.0000 | 0.0000 | 0.0095 | 0.0173 | 0.0639 |
8.3.3.2 Italian training cohort
Open code
res <- stratIT %>%
filter(grepl("TRPSYN-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `TRPSYN-PWY: L-tryptophan biosynthesis` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0 | 0.1310 | 0.2364 | 0.3691 | 0.8064 |
| g__Bacteroides.s__Bacteroides_vulgatus | 0 | 0.0213 | 0.0920 | 0.2311 | 0.7756 |
| g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0 | 0.0000 | 0.0142 | 0.0382 | 0.2364 |
| g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0 | 0.0000 | 0.0128 | 0.0387 | 0.4200 |
| g__Lawsonibacter.s__Lawsonibacter_asaccharolyticus | 0 | 0.0000 | 0.0072 | 0.0258 | 0.4782 |
| g__Firmicutes_unclassified.s__Firmicutes_bacterium_CAG_65_45_313 | 0 | 0.0000 | 0.0000 | 0.0340 | 0.4960 |
| g__Faecalibacterium.s__Faecalibacterium_prausnitzii | 0 | 0.0000 | 0.0000 | 0.0207 | 0.4824 |
| g__Roseburia.s__Roseburia_hominis | 0 | 0.0000 | 0.0000 | 0.0188 | 0.1981 |
| g__Ruminococcus.s__Ruminococcus_bromii | 0 | 0.0000 | 0.0000 | 0.0187 | 0.0688 |
| g__Eubacterium.s__Eubacterium_eligens | 0 | 0.0000 | 0.0000 | 0.0179 | 0.0849 |
8.3.3.3 Czech validation cohort
Open code
res <- stratVAL %>%
filter(grepl("TRPSYN-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `TRPSYN-PWY: L-tryptophan biosynthesis` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0.0166 | 0.0867 | 0.1270 | 0.2093 | 0.4686 |
| g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0276 | 0.0511 | 0.0903 | 0.2820 |
| g__Blautia.s__Blautia_obeum | 0.0031 | 0.0310 | 0.0460 | 0.0701 | 0.2020 |
| g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0291 | 0.0455 | 0.0712 | 0.1757 |
| g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0195 | 0.0408 | 0.0849 | 0.4596 |
| g__Bacteroides.s__Bacteroides_vulgatus | 0.0000 | 0.0135 | 0.0376 | 0.0684 | 0.2935 |
| g__Blautia.s__Blautia_wexlerae | 0.0018 | 0.0208 | 0.0305 | 0.0449 | 0.0889 |
| g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0179 | 0.0293 | 0.0459 | 0.2038 |
| g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0289 | 0.0737 | 0.2440 |
| g__Ruminococcus.s__Ruminococcus_bromii | 0.0000 | 0.0000 | 0.0204 | 0.1113 | 0.3487 |
8.3.4 PWY-8178: pentose phosphate pathway (non-oxidative branch) II
8.3.4.1 Czech training cohort
Open code
res <- stratCZ %>%
filter(grepl("PWY-8178", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-8178: pentose phosphate pathway (non-oxidative branch) II` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0.0599 | 0.1435 | 0.2467 | 0.3486 | 0.6512 |
| g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0119 | 0.0459 | 0.0664 | 0.1660 |
| g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0110 | 0.0388 | 0.0930 | 0.2692 |
| g__Eubacterium.s__Eubacterium_eligens | 0.0000 | 0.0044 | 0.0382 | 0.0794 | 0.4492 |
| g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0078 | 0.0237 | 0.0634 | 0.3114 |
| g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0089 | 0.0205 | 0.0428 | 0.1478 |
| g__Roseburia.s__Roseburia_inulinivorans | 0.0000 | 0.0000 | 0.0138 | 0.0365 | 0.2216 |
| g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0137 | 0.0492 | 0.5145 |
| g__Blautia.s__Blautia_wexlerae | 0.0000 | 0.0056 | 0.0136 | 0.0266 | 0.2900 |
| g__Eubacterium.s__Eubacterium_hallii | 0.0000 | 0.0000 | 0.0135 | 0.0278 | 0.0714 |
8.3.4.2 Italian training cohort
Open code
res <- stratIT %>%
filter(grepl("PWY-8178", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-8178: pentose phosphate pathway (non-oxidative branch) II` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0.0088 | 0.2572 | 0.4697 | 0.6587 | 0.9114 |
| g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0010 | 0.0133 | 0.0367 | 0.3171 |
| g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0014 | 0.0132 | 0.0455 | 0.3451 |
| g__Parabacteroides.s__Parabacteroides_distasonis | 0.0000 | 0.0000 | 0.0110 | 0.0337 | 0.2201 |
| g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0000 | 0.0088 | 0.0293 | 0.2724 |
| g__Flavonifractor.s__Flavonifractor_plautii | 0.0000 | 0.0000 | 0.0052 | 0.0265 | 0.0888 |
| g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0048 | 0.0231 | 0.1822 |
| g__Agathobaculum.s__Agathobaculum_butyriciproducens | 0.0000 | 0.0000 | 0.0028 | 0.0131 | 0.0538 |
| g__Roseburia.s__Roseburia_hominis | 0.0000 | 0.0000 | 0.0027 | 0.0122 | 0.2948 |
| g__Escherichia.s__Escherichia_coli | 0.0000 | 0.0000 | 0.0000 | 0.0374 | 0.7374 |
8.3.4.3 Czech validation cohort
Open code
res <- stratVAL %>%
filter(grepl("PWY-8178", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-8178: pentose phosphate pathway (non-oxidative branch) II` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0.0413 | 0.1185 | 0.1669 | 0.2232 | 0.4912 |
| g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0426 | 0.0673 | 0.0986 | 0.2491 |
| g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0291 | 0.0509 | 0.0966 | 0.2282 |
| g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0019 | 0.0444 | 0.0978 | 0.3274 |
| g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0178 | 0.0397 | 0.0863 | 0.3465 |
| g__Blautia.s__Blautia_wexlerae | 0.0090 | 0.0227 | 0.0375 | 0.0535 | 0.2396 |
| g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0234 | 0.0360 | 0.0635 | 0.2427 |
| g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0210 | 0.0339 | 0.0548 | 0.1649 |
| g__Blautia.s__Blautia_obeum | 0.0004 | 0.0213 | 0.0330 | 0.0462 | 0.1463 |
| g__Eubacterium.s__Eubacterium_hallii | 0.0000 | 0.0222 | 0.0326 | 0.0457 | 0.2198 |
8.3.5 PWY-801: homocysteine and cysteine interconversion
8.3.5.1 Czech training cohort
Open code
res <- stratCZ %>%
filter(grepl("PWY-801", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-801: homocysteine and cysteine interconversion` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0 | 1 | 1 | 1 | 1.0000 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 1.0000 |
| g__Klebsiella.s__Klebsiella_variicola | 0 | 0 | 0 | 0 | 0.7376 |
8.3.5.2 Italian training cohort
Open code
res <- stratIT %>%
filter(grepl("PWY-801", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-801: homocysteine and cysteine interconversion` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0 | 1 | 1 | 1 | 1.0000 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 1.0000 |
| g__Escherichia.s__Escherichia_coli | 0 | 0 | 0 | 0 | 0.2929 |
| g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0 | 0 | 0.0383 |
8.3.5.3 Czech validation cohort
Open code
res <- stratVAL %>%
filter(grepl("PWY-801", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-801: homocysteine and cysteine interconversion` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 0 | 1 | 1 | 1 | 1.0000 |
| g__Enterococcus.s__Enterococcus_gallinarum | 0 | 0 | 0 | 0 | 0.7631 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 0.2465 |
| g__Enterococcus.s__Enterococcus_saccharolyticus | 0 | 0 | 0 | 0 | 0.2369 |
8.3.6 PWY-6293: superpathway of L-cysteine biosynthesis (fungi)
8.3.6.1 Czech training cohort
Open code
res <- stratCZ %>%
filter(grepl("PWY-6293", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6293: superpathway of L-cysteine biosynthesis (fungi)` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 1 | 1 | 1 | 1 | 1 |
8.3.6.2 Italian training cohort
Open code
res <- stratIT %>%
filter(grepl("PWY-6293", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6293: superpathway of L-cysteine biosynthesis (fungi)` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 1 | 1 | 1 | 1 | 1 |
8.3.6.3 Czech validation cohort
Open code
res <- stratVAL %>%
filter(grepl("PWY-6293", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6293: superpathway of L-cysteine biosynthesis (fungi)` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| unclassified | 1 | 1 | 1 | 1 | 1 |
8.3.7 PWY-6284: superpathway of unsaturated fatty acids biosynthesis (E. coli)
8.3.7.1 Czech training cohort
Open code
res <- stratCZ %>%
filter(grepl("PWY-6284", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6284: superpathway of unsaturated fatty acids biosynthesis (E. coli)` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| g__Escherichia.s__Escherichia_coli | 0 | 0 | 0.583 | 1 | 1.0000 |
| unclassified | 0 | 0 | 0.000 | 1 | 1.0000 |
| g__Enterococcus.s__Enterococcus_faecalis | 0 | 0 | 0.000 | 0 | 1.0000 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0.000 | 0 | 0.6643 |
| g__Escherichia.s__Escherichia_marmotae | 0 | 0 | 0.000 | 0 | 0.0502 |
8.3.7.2 Italian training cohort
Open code
res <- stratIT %>%
filter(grepl("PWY-6284", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6284: superpathway of unsaturated fatty acids biosynthesis (E. coli)` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| g__Escherichia.s__Escherichia_coli | 0 | 0 | 1 | 1 | 1.0000 |
| g__Citrobacter.s__Citrobacter_braakii | 0 | 0 | 0 | 0 | 1.0000 |
| g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0 | 0 | 1.0000 |
| g__Leclercia.s__Leclercia_adecarboxylata | 0 | 0 | 0 | 0 | 1.0000 |
| unclassified | 0 | 0 | 0 | 0 | 1.0000 |
| g__Citrobacter.s__Citrobacter_freundii | 0 | 0 | 0 | 0 | 0.9148 |
| g__Enterobacter.s__Enterobacter_mori | 0 | 0 | 0 | 0 | 0.6450 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 0.4175 |
| g__Kosakonia.s__Kosakonia_sacchari | 0 | 0 | 0 | 0 | 0.2467 |
| g__Escherichia.s__Escherichia_fergusonii | 0 | 0 | 0 | 0 | 0.0186 |
8.3.7.3 Czech validation cohort
Open code
res <- stratVAL %>%
filter(grepl("PWY-6284", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6284: superpathway of unsaturated fatty acids biosynthesis (E. coli)` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| g__Escherichia.s__Escherichia_coli | 0 | 0 | 0.2299 | 1.0 | 1.0000 |
| unclassified | 0 | 0 | 0.0000 | 0.5 | 1.0000 |
| g__Streptococcus.s__Streptococcus_oralis | 0 | 0 | 0.0000 | 0.0 | 1.0000 |
| g__Enterobacter.s__Enterobacter_bugandensis | 0 | 0 | 0.0000 | 0.0 | 0.7701 |
| g__Enterococcus.s__Enterococcus_faecium | 0 | 0 | 0.0000 | 0.0 | 0.5860 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0.0000 | 0.0 | 0.4828 |
| g__Enterococcus.s__Enterococcus_durans | 0 | 0 | 0.0000 | 0.0 | 0.4140 |
8.3.8 PWY-5367: petroselinate biosynthesis
8.3.8.1 Czech training cohort
Open code
res <- stratCZ %>%
filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| g__Blautia.s__Ruminococcus_torques | 0 | 0.1836 | 0.9822 | 1.0000 | 1.0000 |
| g__Escherichia.s__Escherichia_coli | 0 | 0.0000 | 0.0000 | 0.0596 | 1.0000 |
| g__Citrobacter.s__Citrobacter_freundii | 0 | 0.0000 | 0.0000 | 0.0000 | 1.0000 |
| unclassified | 0 | 0.0000 | 0.0000 | 0.0000 | 1.0000 |
| g__Enterococcus.s__Enterococcus_faecalis | 0 | 0.0000 | 0.0000 | 0.0000 | 0.3800 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1882 |
| g__Enterobacter.s__Enterobacter_roggenkampii | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1190 |
| g__Streptococcus.s__Streptococcus_pneumoniae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1078 |
| g__Escherichia.s__Escherichia_fergusonii | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0843 |
| g__Escherichia.s__Escherichia_marmotae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0726 |
8.3.8.2 Italian training cohort
Open code
res <- stratIT %>%
filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| g__Escherichia.s__Escherichia_coli | 0 | 0 | 0.6316 | 1 | 1.0000 |
| g__Blautia.s__Ruminococcus_torques | 0 | 0 | 0.0000 | 0 | 1.0000 |
| g__Escherichia.s__Escherichia_marmotae | 0 | 0 | 0.0000 | 0 | 1.0000 |
| g__Leclercia.s__Leclercia_adecarboxylata | 0 | 0 | 0.0000 | 0 | 1.0000 |
| unclassified | 0 | 0 | 0.0000 | 0 | 1.0000 |
| g__Enterobacter.s__Enterobacter_roggenkampii | 0 | 0 | 0.0000 | 0 | 0.9279 |
| g__Enterobacter.s__Enterobacter_mori | 0 | 0 | 0.0000 | 0 | 0.7381 |
| g__Citrobacter.s__Citrobacter_freundii | 0 | 0 | 0.0000 | 0 | 0.7378 |
| g__Citrobacter.s__Citrobacter_braakii | 0 | 0 | 0.0000 | 0 | 0.4592 |
| g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0.0000 | 0 | 0.4285 |
8.3.8.3 Czech validation cohort
Open code
res <- stratVAL %>%
filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{
mat <- select(., -`# Pathway`)
mat_prop <- sweep(mat, 2, colSums(mat), "/")
quants <- t(apply(mat_prop, 1, function(row) {
quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))
cbind(`# Pathway` = .$`# Pathway`, quants)
} %>%
as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
res[, -1] <- lapply(res[, -1], as.numeric)
kableExtra::kable(
res %>%
arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)| # Pathway | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| g__Blautia.s__Ruminococcus_torques | 0 | 1 | 1 | 1 | 1.0000 |
| unclassified | 0 | 0 | 0 | 0 | 1.0000 |
| g__Enterococcus.s__Enterococcus_hirae | 0 | 0 | 0 | 0 | 0.8935 |
| g__Enterobacter.s__Enterobacter_bugandensis | 0 | 0 | 0 | 0 | 0.8379 |
| g__Escherichia.s__Escherichia_coli | 0 | 0 | 0 | 0 | 0.6883 |
| g__Enterococcus.s__Enterococcus_faecium | 0 | 0 | 0 | 0 | 0.5535 |
| g__Enterococcus.s__Enterococcus_durans | 0 | 0 | 0 | 0 | 0.4465 |
| g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0 | 0 | 0.3625 |
| g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 0.3454 |
| g__Escherichia.s__Escherichia_fergusonii | 0 | 0 | 0 | 0 | 0.3117 |
9 Reproducibility
Open code
sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=cs_CZ.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=cs_CZ.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=cs_CZ.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Prague
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] vegan_2.6-6.1 lattice_0.22-5 permute_0.9-7
## [4] zCompositions_1.5.0-4 truncnorm_1.0-8 NADA_1.6-1.1
## [7] survival_3.7-0 MicrobiomeStat_1.2 glmnet_4.1-8
## [10] pROC_1.18.0 arm_1.12-2 lme4_1.1-35.5
## [13] Matrix_1.7-0 MASS_7.3-65 car_3.1-2
## [16] carData_3.0-5 emmeans_1.10.4 brms_2.21.0
## [19] Rcpp_1.0.13 rms_6.8-1 Hmisc_5.1-3
## [22] glmmTMB_1.1.9 ggtext_0.1.2 ggdist_3.3.2
## [25] cowplot_1.1.1 ggpubr_0.4.0 sjPlot_2.8.16
## [28] kableExtra_1.4.0 flextable_0.9.6 gtsummary_2.0.2
## [31] compositions_2.0-8 janitor_2.2.0 stringi_1.8.7
## [34] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
## [37] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
## [40] tidyr_1.3.1 tibble_3.3.0 ggplot2_3.5.1
## [43] tidyverse_1.3.1 readxl_1.3.1 openxlsx_4.2.8
## [46] RJDBC_0.2-10 rJava_1.0-11 DBI_1.2.3
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.6 matrixStats_1.3.0 httr_1.4.2
## [4] insight_0.20.2 numDeriv_2016.8-1.1 tools_4.4.3
## [7] backports_1.5.0 utf8_1.2.6 sjlabelled_1.2.0
## [10] R6_2.6.1 mgcv_1.9-1 withr_3.0.2
## [13] Brobdingnag_1.2-7 prettyunits_1.2.0 gridExtra_2.3
## [16] bayesm_3.1-6 quantreg_5.98 cli_3.6.5
## [19] textshaping_0.3.6 performance_0.12.2 officer_0.6.6
## [22] sandwich_3.0-1 labeling_0.4.2 mvtnorm_1.1-3
## [25] robustbase_0.93-9 polspline_1.1.25 ggridges_0.5.3
## [28] askpass_1.1 QuickJSR_1.3.1 commonmark_1.9.1
## [31] systemfonts_1.0.4 StanHeaders_2.32.10 foreign_0.8-90
## [34] gfonts_0.2.0 svglite_2.1.3 rstudioapi_0.16.0
## [37] httpcode_0.3.0 generics_0.1.4 shape_1.4.6
## [40] distributional_0.4.0 zip_2.2.0 inline_0.3.19
## [43] loo_2.4.1 abind_1.4-5 lifecycle_1.0.4
## [46] multcomp_1.4-18 yaml_2.3.10 snakecase_0.11.1
## [49] grid_4.4.3 promises_1.2.0.1 crayon_1.5.3
## [52] haven_2.4.3 pillar_1.11.0 knitr_1.50
## [55] statip_0.2.3 boot_1.3-31 estimability_1.5.1
## [58] codetools_0.2-19 glue_1.7.0 V8_4.4.2
## [61] fontLiberation_0.1.0 data.table_1.15.4 vctrs_0.6.5
## [64] cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
## [67] datawizard_0.12.2 xfun_0.52 mime_0.12
## [70] coda_0.19-4 modeest_2.4.0 timeDate_3043.102
## [73] iterators_1.0.14 statmod_1.4.36 TH.data_1.1-0
## [76] nlme_3.1-168 fontquiver_0.2.1 rstan_2.32.6
## [79] fBasics_4041.97 tensorA_0.36.2.1 TMB_1.9.14
## [82] rpart_4.1.24 colorspace_2.0-2 nnet_7.3-20
## [85] tidyselect_1.2.1 processx_3.8.4 timeSeries_4032.109
## [88] compiler_4.4.3 curl_6.4.0 rvest_1.0.2
## [91] htmlTable_2.4.0 SparseM_1.81 xml2_1.3.3
## [94] fontBitstreamVera_0.1.1 posterior_1.6.0 checkmate_2.3.2
## [97] scales_1.3.0 DEoptimR_1.0-10 callr_3.7.6
## [100] spatial_7.3-15 digest_0.6.37 minqa_1.2.4
## [103] rmarkdown_2.27 htmltools_0.5.8.1 pkgconfig_2.0.3
## [106] base64enc_0.1-3 stabledist_0.7-2 dbplyr_2.1.1
## [109] fastmap_1.2.0 rlang_1.1.6 htmlwidgets_1.6.4
## [112] shiny_1.9.1 farver_2.1.0 zoo_1.8-9
## [115] jsonlite_2.0.0 magrittr_2.0.3 Formula_1.2-4
## [118] bayesplot_1.8.1 munsell_0.5.0 gdtools_0.3.7
## [121] stable_1.1.6 plyr_1.8.6 pkgbuild_1.3.1
## [124] parallel_4.4.3 ggrepel_0.9.5 sjmisc_2.8.10
## [127] ggeffects_1.7.0 splines_4.4.3 gridtext_0.1.5
## [130] hms_1.1.3 sjstats_0.19.0 ps_1.7.7
## [133] uuid_1.0-3 markdown_1.13 ggsignif_0.6.3
## [136] stats4_4.4.3 rmutil_1.1.10 rstantools_2.1.1
## [139] crul_1.5.0 reprex_2.0.1 evaluate_1.0.4
## [142] RcppParallel_5.1.8 modelr_0.1.8 nloptr_2.0.0
## [145] tzdb_0.5.0 foreach_1.5.2 httpuv_1.6.5
## [148] MatrixModels_0.5-3 openssl_1.4.6 clue_0.3-65
## [151] broom_1.0.6 xtable_1.8-4 rstatix_0.7.0
## [154] later_1.3.0 viridisLite_0.4.0 ragg_1.4.0
## [157] lmerTest_3.1-3 cluster_2.1.8.1 timechange_0.3.0
## [160] bridgesampling_1.1-2